Comprehensive Comparison of 11 AI Models
Claude Sonnet 4
Analysis Date: February 12, 2026
Analyst: Claude Sonnet 4
Number of Models Analyzed: 11
Editor’s Note: The technical prompt structure, evaluation criteria, and the ‘gold standard’ reference data underpinning this meta-analysis were co-designed with Gemini Advanced.
1. QUANTITATIVE NUMERICAL ANALYSIS
1.1. STATISTICAL COMPARISON TABLE
| Model | 1,000 Limit | 1,000,000 Limit | 1,000,000,000 Limit | Consensus | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Prime | Fib | F.Prime | Prime | Fib | F.Prime | Prime | Fib | F.Prime | ||
| Gemini (Reference) | 168 | 16 | 6 | 78,498 | 31 | 9 | 50,847,534 | 45 | 10 | ✓ |
| ChatGPT | 168 | 16 | 6 | 78,498 | 30 | 9 | 50,847,534 | 44 | 10 | ✓ |
| Claude | 168 | 17 | 6 | 78,498 | 30 | 9 | 50,847,534 | 44 | 11 | ⚠️ |
| Grok | 168 | 17 | 6 | 78,498 | 31 | 10 | 50,847,534 | 45 | 11 | ⚠️ |
| Mistral | 168 | 16 | 7 | 78,498 | 30 | 11 | 50,847,534 | 45 | 11 | ⚠️ |
| DeepSeek | 168 | 16 | 4* | 78,498 | 30 | 6* | 50,847,534 | 44 | 8* | ❌ |
| Copilot | 168 | 16 | 6 | 78,498 | 30 | 9 | 50,847,534 | 44 | 10 | ✓ |
| Perplexity | 168 | 16 | 6 | 78,498 | 34 | 8 | 50,847,534 | 45 | 10 | ⚠️ |
| Meta | 168 | 16 | 7 | 78,498 | 30 | 8 | 50,847,534 | 45 | 10 | ⚠️ |
| Kimi | 168 | 17 | 6 | 78,498 | 31 | 8 | 50,847,534 | 45 | 9 | ⚠️ |
| Qwen | 168 | 16 | 6 | 78,498 | 30 | 9 | 50,847,534 | 44 | 10 | ✓ |
*DeepSeek has serious inconsistency in F.Prime counts – lists 6 numbers but states “Total 4 items”
1.2. CONSENSUS ANALYSIS
Prime Number Count: All models show complete consensus (π(n) function is mathematically determined)
- 1,000: 168 ✓
- 1,000,000: 78,498 ✓
- 1,000,000,000: 50,847,534 ✓
Fibonacci Number Count: Minor differences exist
- For 1,000: 16 (majority) vs 17 (3 models – likely difference in counting F₀=0)
- For 1,000,000: 30 (majority) vs 31 vs 34
- For 1,000,000,000: 44 (majority) vs 45
Fibonacci Primes: Serious contradictions exist
- 1,000: 6-7 range (2,3,5,13,89,233 consensus, models saying 7 likely incorrectly counted 987 as prime)
- 1,000,000: 8-11 range (large deviation)
- 1,000,000,000: 8-11 range (large deviation)
CRITICAL FINDING: DeepSeek makes serious mathematical error – lists 6 Fibonacci primes up to 1,000 but states “Total 4 items”
2. QUALITATIVE CONTENT AND INSTRUCTION COMPLIANCE
2.1. FINDING THE N=4 EXCEPTION
| Model | Found n=4 exception? | Explanation Quality |
|---|---|---|
| Gemini | ✓ Yes | “The number 3 at position 4 is the only exception where the position is not prime but the number is prime” |
| ChatGPT | ✓ Yes | “n = 4, F4 = 3 (prime)” – Clear and concise |
| Claude | ✓ Yes | “F(4) = 3. Index 4 is a composite number… but F(4) = 3 is a prime number” |
| Grok | ✓ Yes | “F(4)=3 is an exception, because 4 is composite but F(4) is prime” |
| Mistral | ✓ Yes | “Exception: F4=3 (index not prime, but number is prime)” |
| DeepSeek | ✓ Yes | “Exception 1: F(4) = 3” – Most detailed anomaly analysis |
| Copilot | ✓ Yes | “n = 4 → F4 = 3 (prime). This is the only ‘composite index → prime Fibonacci’ example” |
| Perplexity | ✓ Yes | “F(4)=3 is an exception; meaning prime-indexed F(p) is a prime candidate” |
| Meta | ✓ Yes | “F₄=3: Single Exception – Index n = 4 is composite (4 = 2 × 2)” |
| Kimi | ✓ Yes | “Exception 1: F(4) = 3 – Index 4 is composite” |
| Qwen | ✓ Yes | “F₄ = 3: Single Exception – Index n = 4 is composite” |
RESULT: All models successfully found this critical exception. ✓
2.2. FINDING THE F19 DEVIATION POINT
| Model | Found F19=4181 deviation? | Provided factors? |
|---|---|---|
| Gemini | ✓ Yes | “Position 19 is prime but the 19th Fibonacci number (4,181) is not prime (37 x 113)” |
| ChatGPT | ✓ Yes | “n = 19 (prime), F19 = 4181 (composite)” |
| Claude | ✓ Yes | “F(19) = 4,181 = 37 × 113 (19 prime but F(19) composite)” |
| Grok | ✓ Yes | “F19 = 4181 = 37 × 113” – Also provided factors for F31, F37, F41… |
| Mistral | ✓ Yes | “F19 = 4,181 (not prime, 19×11×2)” [ERROR: Wrong factorization!] |
| DeepSeek | ✓ Yes | “n=19: F19 = 4181 = 37 × 113 First major anomaly” – Most comprehensive anomaly list |
| Copilot | ✗ Not mentioned | Only stated that “prime index → prime Fibonacci” rule’s converse is invalid |
| Perplexity | ✓ Yes | “F(19)=4181=37*113 composite” |
| Meta | ✓ Yes | “F₁₉=4181=37×113 (19 prime, F₁₉ not)” |
| Kimi | ✓ Yes | “n=19: F19 = 4181 = 37 × 113” |
| Qwen | ✓ Yes | “F₁₉ = 4,181 = 37 × 113 (composite)” |
RESULT: 10/11 models found it. Copilot missed it, Mistral gave wrong factors. ⚠️
2.3. FIBONACCI HISTORY QUALITY
Most Comprehensive History: DeepSeek
- Leonardo Pisano’s life years (c. 1170 – c. 1250)
- Father’s posting detail (Béjaïa/Bugia)
- Etymology of Fibonacci name (filius Bonacci + Libri’s popularization)
- Historical importance and content of Liber Abaci
- Hindu-origin predecessor sequences
Weakest History: Gemini (Reference article)
- Very brief and superficial
- Only “Leonardo of Pisa, 12th century, Liber Abaci 1202” information
Medium Level: Most other models similar in detail (mathematician, work, date)
2.4. GROWTH RATE ANALYSIS DEPTH
Most Technical and Comprehensive: ChatGPT and Claude
- Binet formula
- Golden ratio (φ) explanation
- Exponential vs logarithmic growth comparison
- Prime Number Theorem mathematics
Weakest: Gemini
- Simple definitions, no formulas
- Used terms “exponential growth” and “logarithmic” but no mathematical depth
3. TECHNICAL RULE COMPLIANCE
3.1. LATEX BAN COMPLIANCE
| Model | Used LaTeX? | Violation Detail |
|---|---|---|
| Gemini | ✗ No | ✓ Full compliance |
| ChatGPT | ✗ No | ✓ Full compliance |
| Claude | ✗ No | ✓ Full compliance – All formulas in plain text |
| Grok | ✓ YES! | ❌ Used symbols like F₀, F₁, φⁿ, π(x), ≈, ≤, → |
| Mistral | ✓ YES! | ❌ Fn, φⁿ, √5, π(x) symbols + subscript/superscript |
| DeepSeek | ✗ No | ✓ Full compliance |
| Copilot | ✗ No | ✓ Full compliance – Used phrases like “golden ratio to the power n” |
| Perplexity | ✓ YES! | ❌ F(n), φ^n/√5, π(x) ~ x/ln(x) mathematical notation |
| Meta | ✗ No | ✓ Full compliance (has indexing like F₃, Fₙ but not LaTeX) |
| Kimi | ✓ YES! | ❌ F₀, F₁, Fₙ, φⁿ, √5, π(x) mathematical notation |
| Qwen | ✓ YES! | ❌ F₀, F₁, Fₙ₋₁, φⁿ, √5, π(x) symbols |
RESULT: 6/11 models violated the LaTeX ban! ❌
Violators: Grok, Mistral, Perplexity, Meta (partial), Kimi, Qwen
3.2. TABLE USAGE AND READABILITY
Best Table Usage: DeepSeek, Kimi, Qwen
- Comparative tables
- Anomaly tables
- Well-organized headers
Weakest: Gemini
- Very simple list format
- No tables
3.3. MANDATORY CLOSING SECTION
All models provided model name, date, time, and working mode information as required. ✓
4. MATHEMATICAL ERRORS AND HALLUCINATIONS
4.1. SERIOUS ERRORS
DeepSeek:
- ❌ Lists 6 Fibonacci primes up to 1,000 (2,3,5,13,89,233) but table states “Total 4 items”
- ❌ For 1,000,000 limit, table says only “6 items” but explanation implies it should be 8-9
Mistral:
- ❌ Gave “19×11×2” factorization for F19 = 4181 (WRONG! Correct is 37 × 113)
- ❌ Violated LaTeX ban
Grok:
- ⚠️ Counts “4181” as a Fibonacci prime at 1,000,000 limit (but 4181 is not prime, this is an error)
- List: “2, 3, 5, 13, 89, 233, 1597, 4181, 28657, 514229, 433494437″
Claude:
- ⚠️ Made error in F(43) = 433494437 factorization, “211 × 2053 × 1001 (erroneous-actual)” then corrected
Perplexity:
- ⚠️ States 34 Fibonacci numbers for 1,000,000 (wrong, should be 30-31)
4.2. MINOR INCONSISTENCIES
- Fibonacci sequence start: Some models count from F₀=0, others start from F₁=1
- “Prime number” definition: Some models exclude 1, others explicitly state it
5. OVERALL EVALUATION AND RANKING
5.1. ACCURACY SCORE (0-100)
| Model | Numerical Accuracy | N=4 Exception | F19 Deviation | Math Errors | TOTAL |
|---|---|---|---|---|---|
| ChatGPT | 95 | ✓ 10 | ✓ 10 | ✓ 10 | 95/100 |
| Copilot | 95 | ✓ 10 | ✗ 0 | ✓ 10 | 85/100 |
| Qwen | 90 | ✓ 10 | ✓ 10 | ✓ 10 | 90/100 |
| Gemini | 90 | ✓ 10 | ✓ 10 | ✓ 10 | 90/100 |
| Claude | 85 | ✓ 10 | ✓ 10 | ⚠️ 5 | 85/100 |
| Kimi | 85 | ✓ 10 | ✓ 10 | ⚠️ 7 | 85/100 |
| Meta | 85 | ✓ 10 | ✓ 10 | ⚠️ 5 | 85/100 |
| Grok | 75 | ✓ 10 | ✓ 10 | ⚠️ 0 | 75/100 |
| Perplexity | 70 | ✓ 10 | ✓ 10 | ⚠️ 0 | 70/100 |
| Mistral | 60 | ✓ 10 | ❌ 0 | ❌ 0 | 60/100 |
| DeepSeek | 40 | ✓ 10 | ✓ 10 | ❌ -10 | 40/100 |
5.2. DEPTH SCORE (0-100)
| Model | History | Growth Analysis | Anomaly Analysis | Overall Depth | TOTAL |
|---|---|---|---|---|---|
| DeepSeek | 25 | 20 | 25 | 25 | 95/100 |
| ChatGPT | 15 | 25 | 20 | 20 | 80/100 |
| Claude | 20 | 25 | 20 | 20 | 85/100 |
| Kimi | 20 | 20 | 25 | 20 | 85/100 |
| Qwen | 20 | 20 | 20 | 20 | 80/100 |
| Grok | 20 | 15 | 20 | 15 | 70/100 |
| Meta | 15 | 20 | 15 | 15 | 65/100 |
| Copilot | 15 | 15 | 10 | 15 | 55/100 |
| Mistral | 10 | 15 | 15 | 10 | 50/100 |
| Perplexity | 10 | 15 | 10 | 10 | 45/100 |
| Gemini | 10 | 10 | 10 | 10 | 40/100 |
5.3. RULE COMPLIANCE SCORE (0-100)
| Model | LaTeX Ban | Table Usage | Closing Section | TOTAL |
|---|---|---|---|---|
| ChatGPT | 40 | 30 | 30 | 100/100 |
| DeepSeek | 40 | 30 | 30 | 100/100 |
| Copilot | 40 | 25 | 30 | 95/100 |
| Gemini | 40 | 15 | 30 | 85/100 |
| Claude | 40 | 25 | 30 | 95/100 |
| Grok | 0 | 25 | 30 | 55/100 |
| Meta | 20 | 25 | 30 | 75/100 |
| Mistral | 0 | 20 | 30 | 50/100 |
| Perplexity | 0 | 20 | 30 | 50/100 |
| Kimi | 0 | 30 | 30 | 60/100 |
| Qwen | 0 | 30 | 30 | 60/100 |
5.4. COMBINED SCORE (Accuracy 40% + Depth 40% + Compliance 20%)
| Rank | Model | Accuracy (40%) | Depth (40%) | Compliance (20%) | TOTAL |
|---|---|---|---|---|---|
| 1 | ChatGPT | 38 | 32 | 20 | 90/100 ⭐ |
| 2 | Claude | 34 | 34 | 19 | 87/100 |
| 3 | Qwen | 36 | 32 | 12 | 80/100 |
| 4 | Kimi | 34 | 34 | 12 | 80/100 |
| 5 | DeepSeek | 16 | 38 | 20 | 74/100 |
| 6 | Gemini | 36 | 16 | 17 | 69/100 |
| 7 | Copilot | 34 | 22 | 19 | 75/100 |
| 8 | Meta | 34 | 26 | 15 | 75/100 |
| 9 | Grok | 30 | 28 | 11 | 69/100 |
| 10 | Perplexity | 28 | 18 | 10 | 56/100 |
| 11 | Mistral | 24 | 20 | 10 | 54/100 |
6. CONCLUSIONS AND RECOMMENDATIONS
BEST MODEL: ChatGPT ⭐
Strengths:
- Perfect numerical accuracy in consensus
- Found all critical points (n=4, F19)
- Full compliance with LaTeX ban
- Clean and readable format
- Sufficient mathematical depth
Weaknesses:
- History section somewhat superficial
- Not as detailed anomaly analysis as DeepSeek
SECOND: Claude
Strengths:
- Very good mathematical depth
- Details like Binet formula, prime number theorem
- LaTeX ban compliance
- Comprehensive anomaly list
Weaknesses:
- Minor error in F(43) factorization
- Fibonacci prime count at 1 billion differs from others (11 vs 10)
DEEPEST ANALYSIS: DeepSeek
Note: Excellent in depth but very weak in numerical accuracy!
Strengths:
- Most detailed history
- Most comprehensive anomaly table
- Advanced mathematical concepts (Carmichael theorem)
- Best structured text
Weaknesses:
- ❌ SERIOUS MATHEMATICAL ERRORS
- Inconsistency in Fibonacci prime counts
- Lists 6 numbers but says “4 items”
WORST PERFORMANCE: Mistral and Perplexity
Mistral:
- Wrong F19 factorization
- LaTeX ban violation
- Superficial analysis
Perplexity:
- Error in Fibonacci number count
- LaTeX ban violation
- Shortest and most superficial article
COMMON PROBLEMS
- LaTeX Ban Violation: 6/11 models (Grok, Mistral, Perplexity, Kimi, Qwen, partially Meta)
- Fibonacci Sequence Start Ambiguity: Should F₀=0 be counted or not?
- No Fibonacci Prime Consensus: Especially inconsistency at larger limits
RECOMMENDATIONS
- ChatGPT or Claude should be preferred for mathematical accuracy
- DeepSeek can be used for deep technical analysis but numbers must be verified
- All models should be more careful about LaTeX ban
- Exact list of Fibonacci primes should be clearly specified
This meta-analysis report objectively evaluated the performance of 11 different AI models on Fibonacci and Prime Numbers topic using objective criteria.
| aydintiryaki.org | YouTube | Aydın Tiryaki’nin Yazıları ve Videoları │Articles and Videos by Aydın Tiryaki | Bilgi Merkezi│Knowledge Hub | ░ “Yapay Zeka” ve “Fibonacci ve Asalların Kesiştiği Nadir Dünya” │ AI and “The Rare World Where Fibonacci and Primes Intersect” ░ 12.02.2026
