Aydin Tiryaki (2025)
Today, the inflation figures for November 2025 were announced. How about testing AI models with inflation questions on such a day?
This exam will have several rounds of questions. The first two prompts will be presented sequentially within the same session, with short prompts given until you either find the correct answer or with short prompts given until the correct answer is found or the attempt to find it is abandoned.
Subsequent requests will be submitted in separate sessions to avoid being influenced by previous work.
PROMPT 1: RESEARCH AND CALCULATION
According to TurkStat (TÜİK) data, what is the cumulative inflation (CPI) percentage for the first 11 months of 2025 in Türkiye? What should December’s CPI inflation be for the CPI to reach 29.9 percent in 2025?
For the first question in this prompt, the student will need to research and find the CPI values for the past 11 months, including the November value announced this morning. In other words, they will also be asked to find new data. They will need to perform a cumulative calculation using this information. To solve the second question, they will need to create an equation and find its result. In other words, they will both research and calculate. (Appendix 1)
When 10 artificial intelligences were asked questions, they produced common results except for two.
PROMPT 2: ACCOUNT
“Assuming that the December CPI value you calculated is the December inflation, can you find the CPI value for 2025 as the cumulative inflation from the 12-month CPI values?”
The purpose of this request was to find the 29.9% inflation rate. Even if their initial calculations were incorrect, they would need to find this number. Even if they couldn’t calculate it, they would need to know the previously provided figure. (Appendix 2)
As a result of this experiment, 7 artificial intelligence models reached the same conclusion.
PROMPT 3: LOTS OF RESEARCH – EASY CALCULATION
“Could you examine the monthly CPI values announced by TurkStat for the 10-year period covering 2015-2024, and then 1- Calculate the average CPI for the first six months of each year and list the years when the June CPI value was lower than this average? 2- Calculate the average CPI for the second six months of each year and list the years when the December CPI value was lower than this average?”
This request requires more comprehensive research (long-term and large data sets). Subsequent calculations and comparisons appear straightforward. (Appendix 3)
8 AI models responded to this request, they must have had trouble compiling data because not all responses were the same.
PROMPT 4: PREPARING A LIST FROM A DOCUMENT
The purpose of this request was to re-prepare lists that were difficult to examine in a comprehensible format. (Appendix 4)
When asked to compile and organize the results of the previous prompt from a single source, three models did not respond at all. Two models must have misunderstood the question and gave a more general answer, but it was not the answer to this question.
PROMPT 5: PREPARING A LIST FROM A DOCUMENT
” https://aydintiryaki.org/2025/12/04/yapay-zeka-ne-kadar-zeki-5-enflasyon-hesabi/ In this article, in the “PROMPT 4” section, could you examine the lists prepared in different formats by the seven artificial intelligence models in the headings and compare them with each other? Could you express the percentages to which these lists are compatible with the lists prepared by other artificial intelligence models?”
I only tried this request with Gemini because it was a complicated situation. (Appendix 5)
When the previous prompter was asked to examine the responses from a single document and report similarities, good agreement was observed between Gemini, ChatGPT, and Grok.
CONCLUSION
Although most artificial intelligence models answered the first two prompts, which included easily accessible data and simple calculations, without difficulty, this experiment should have achieved 100 percent success.
The question in the third prompt received answers from most of them, but how compatible the results were is debatable.
When working on a single document for the fourth prompt, the responsive models performed quite well. Despite some minor hiccups, they performed well. I attempted to demonstrate this result with a single model for the fifth prompt.
Aydin Tiryaki
Ankara, December 3, 2025
APPENDIX 1
Gemini:
11-month cumulative CPI: 29.74%
December inflation for 29.9% CPI: 0.1233%
He found the same numbers for the model he was considering with the Gemini 3 Pro, but he saw that TÜİK gave two figures after the decimal point and reported it as 0.12%.
ChatGPT:
11-month cumulative CPI: approximately 29.7-29.8% (20.74)
December inflation for 29.9% CPI: approximately 0.7-0.2% (0.12)
The values in parentheses are those calculated after the request “Can you calculate the results as a percentage to 2 digits after the decimal point?”
Grok:
11-month cumulative CPI: 29.74%
December inflation for 29.9% CPI: 0.13%
Deepseek: “Since the Turkish Statistical Institute (TÜİK) has not yet published data for 2025 , future inflation rates cannot be known after this date.”
“All 11 months have been published.” Despite me providing the information, Deepseek did not perform any searches.
Copilot:
11-month cumulative CPI: 29.74%
December inflation for 29.9% CPI: 0.12%
Perplexity:
11-month cumulative CPI: approximately 28.6%
December inflation for 29.9% CPI: 1.5%
Claude:
11-month cumulative CPI: 28.44%
December inflation for 29.9% CPI: 1.14%
Meta:
11-month cumulative CPI: 29.735%
December inflation for 29.9% CPI: approximately 0.1-0.2%
Kimi (took too long to answer):
11-month cumulative CPI: 30.7%
December inflation for 29.9% CPI: -0.57% (deflation)
Qwen:
“Until now (as of December 4, 2025), the Turkish Statistical Institute (TÜİK) has not published official CPI (Consumer Price Index) data for 2025. The latest data published by TÜİK is for 2024. ”
APPENDIX 2
Gemini: 29.9%
ChatGPT: 29.9%
Grok: 29.90%
Deepseek: –
Copilot: 29.9%
Perplexity: 30.53%
Claude: 29.90%
Meta: It requested the information from me because it forgot what it had calculated a moment ago in the same session, and I did not answer.
Kimi: 29.9%
Qwen: –
APPENDIX 3
Gemini:
Years in which the June CPI Value was Lower than the First 6-Month Average: 2015, 2016, 2017, 2019, 2024
Years in which the December CPI Value was Lower than the Second Six-Month Average: 2015, 2017, 2018, 2019, 2022, 2024,
ChatGPT:
Years with “June < January–June average”: 2015, 2016, 2017, 2019, 2022, 2024
Years with “December < July–December average”: 2015, 2017, 2018, 2019, 2020, 2022, 2023, 2024
Grok:
Years with lower than average – June values: 2015, 2017, 2019, 2022, 2024
Years with a lower than average – December values: 2015, 2017, 2018, 2019, 2022, 2023, 2024
Deepseek:
June < First 6 Months Avg. 2015, 2016, 2017, 2019, 2022, 2024: 6 times
December < Second 6 Months Avg. 2015, 2017, 2018, 2019, 2020, 2022, 2023, 2024: 8 times
Copilot:
Years when the June CPI was below the first 6-month average: 2015, 2016, 2017, 2019, 2020, 2021, 2023
Years in which the December CPI was below the second six-month average: 2015, 2017, 2018, 2019, 2020, 2022, 2023, 2024
Perplexity:
This type of analysis requires downloading CPI values broken down by year and month from the TurkStat data portal and calculating them separately for each year. While this data is available on the TurkStat data portal, it’s not directly presented in a list. Therefore, a dedicated dataset needs to be downloaded and calculated for the analysis. If you’d like, I can provide a sample year’s data and explain the calculation step-by-step.
Claude:
Years in which the June CPI was lower than the first 6-month average:
2015 (June: -0.51% < Average: 0.78%), 2016 (June: 0.47% < Average: 0.60%), 2017 (June: -0.27% < Average: 0.96%), 2019 (June: 0.03% < Average: 0.82%), 2022 (June: 4.95% < Average: 6.09%), 2024 (June: 1.64% < Average: 3.76%)
Years in which the December CPI was lower than the second six-month average:
2015 (December: 0.21% < Average: 0.64%), 2017 (December: 0.69% < Average: 0.93%), 2018 (December: -0.40% < Average: 1.66%), 2019 (December: 0.74% < Average: 1.06%), 2020 (December: 1.25% < Average: 1.35%), 2022 (December: 1.18% < Average: 2.42%), 2023 (December: 2.93% < Average: 5.50%), 2024 (December: 1.03% < Average: 2.47%)
Meta:
The “thinking” information remained on the screen for a very long time.
Kimi:
Years in which the June CPI was lower than the first 6-month average (in ordinal order): 2024 (Difference: -2.12 pp), 2015 (Difference: -1.29 pp), 2017 (Difference: -1.23 pp), 2022 (Difference: -1.14 pp), 2019 (Difference: -0.79 pp), 2016 (Difference: -0.13 pp)
Years when December CPI was lower than the second 6-month average (in ordinal order): 2023 (Difference: -2.57 pp), 2018 (Difference: -2.06 pp), 2024 (Difference: -1.44 pp), 2022 (Difference: -1.24 pp), 2015 (Difference: -0.43 pp), 2017 (Difference: -0.24 pp), 2019 (Difference: -0.12 pp), 2020 (Difference: -0.10 pp)
Qwen:
Years when the June CPI was lower than the first 6-month average: 2015, 2016, 2019, 2020, 2023, 2024
Years when the December CPI was lower than the last 6-month (July–December) average: 2015, 2018, 2019, 2022
APPENDIX 4
GEMINI:
June (CPI Lower than First 6 Month Average) Results
| Year | Artificial Intelligence Models |
| 2015 | Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen |
| 2016 | Gemini, ChatGPT, Deepseek, Copilot, Claude, Kimi, Qwen |
| 2017 | Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi |
| 2018 | (None of the models found the correct result) |
| 2019 | Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen |
| 2020 | Deepseek, Copilot, Qwen |
| 2021 | Copilot |
| 2022 | ChatGPT, Grok, Deepseek, Claude, Kimi |
| 2023 | Copilot, Qwen |
| 2024 | Gemini, ChatGPT, Grok, Deepseek, Claude, Kimi, Qwen |
December (CPI Lower than Second 6-Month Average) Results
| Year | Artificial Intelligence Models |
| 2015 | Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen |
| 2016 | (None of the models found the correct result) |
| 2017 | Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi |
| 2018 | Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen |
| 2019 | Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen |
| 2020 | ChatGPT, Deepseek, Copilot, Claude, Kimi |
| 2021 | (None of the models found the correct result) |
| 2022 | Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen |
| 2023 | ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi |
| 2024 | Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi |
ChatGPT:
June:
2015: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen
2016: Gemini, ChatGPT, Deepseek, Copilot, Claude, Kimi, Qwen
2017: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi
2018: –
2019: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen
2020: ChatGPT, Deepseek, Copilot, Claude, Qwen
2021: –
2022: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen
2023: ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi
2024: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi
December:
2015: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen
2016: –
2017: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi
2018: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen
2019: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen
2020: ChatGPT, Deepseek, Copilot, Claude, Kimi
2021: –
2022: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen
2023: ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi
2024: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi
GROK:
June
2015: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen
2016: Gemini, ChatGPT, Deepseek, Copilot, Claude, Kimi, Qwen
2017: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi
2018: None/
2019: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen
2020: Deepseek, Copilot, Qwen
2021: Copilot
2022: ChatGPT, Grok, Deepseek, Claude, Kimi
2023: Copilot, Qwen
2024: Gemini, ChatGPT, Grok, Deepseek, Claude, Kimi, Qwen
December
2015: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen
2016: None
2017: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi
2018: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen/
2019: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen
2020: ChatGPT, Deepseek, Copilot, Claude, Kimi
2021: None
2022: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen
2023: ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi
2024: Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi
DEEPSEEK:
Models for JUNE between 2015 and 2024:
- 2015: GPT-4o, Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2016: Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2017: GPT-4o, Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2018: GPT-4o, Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2019: GPT-4o, Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2020: GPT-4o, Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2021: GPT-4o, Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2022: GPT-4o, Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2023: GPT-4o, Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2024: GPT-4o, Llama 3.1, DeepSeek, Gemini 1.5 Pro
Models for DECEMBER between 2015 and 2024:
- 2015: Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2016: Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2017: Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2018: Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2019: Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2020: Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2021: Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2022: Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2023: Llama 3.1, DeepSeek, Gemini 1.5 Pro
- 2024: Llama 3.1, DeepSeek, Gemini 1.5 Pro
Note: GPT-4o provided incomplete or incorrect answers for some December months (e.g., 2018 and 2024), so December is not included in the list. The text does not separate all years individually; it simply states that “GPT-4o’s performance is poor for December.” Therefore, the above list is based on the limited information in the text.
CLAUDE:
FOR JUNE:
2015: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini
2.0 Flash Experimental, Grok 2, Llama 3.3 70B
DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
2018: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
2019: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
2020: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
2021: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
2022: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
2023: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
2024: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
FOR DECEMBER:
2015: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini
2.0 Flash Experimental, Grok 2, Llama 3.3 70B
DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
2018: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
2019: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
2020: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
2021: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
2022: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
2023: Claude 3.5 Sonnet v2, DeepSeek V3, Gemini 2.0 Flash Experimental, Grok 2, Llama 3.3 70B
META:
I did not receive an answer.
KIMI:
I did not receive an answer.
QWEN:
I did not receive an answer.
APPENDIX 5
GEMINI
Comparative Analysis
The analysis was conducted using the meta-lists (lists of which AI models responded correctly for which year) prepared by Gemini , ChatGPT , and Grok , which are included in section “PROMPT 4.” The lists provided by Deepseek and Claude , unlike the other models, were not included in the direct comparison because they listed the same set of models for all years and did not reflect the specific results of the experiment. Meta, Kimi, and Qwen, however, were unable to respond to this prompt.
June (First 6 Month Average) Lists Compliance Percentages
This comparison shows how closely the lists of models that find years when the June CPI value is lower than the first 6-month average overlap.
| To compare | Compliance Percentage (%) |
| Gemini vs Grok | 100.00% |
| Gemini vs ChatGPT | 76.27% |
| ChatGPT vs Grok | 76.27% |
Comment:
- The lists prepared by Gemini and Grok are completely identical with 100% compatibility.
- ChatGPT ‘s list, however, shows a 76.27% agreement with both Gemini and Grok’s lists. This means that ChatGPT included or excluded models different from the other two models when listing performances in 2020, 2022, 2023, and 2024.
December (Second 6 Month Average) Lists Compliance Percentages
This comparison shows how closely the lists of models that find years when the December CPI value is lower than the second six-month average overlap.
| To compare | Compliance Percentage (%) |
| Gemini vs ChatGPT | 100.00% |
| Gemini vs Grok | 100.00% |
| ChatGPT vs Grok | 100.00% |
Comment:
- There is complete convergence (100%) between the December lists prepared by Gemini, ChatGPT , and Grok . This indicates that all three models agree on the AI models that will identify years with lower-than-average December CPI values between 2015 and 2024.
A Quick Note on Listings of Other Models
- Instead of analyzing the eight models (Gemini, ChatGPT, Grok, Deepseek, Copilot, Claude, Kimi, Qwen) that yielded different results in the “PROMPT 3” experiment, Deepseek and Claude ‘s lists listed a model/group of models that they identified as the ones that gave the correct answer for all years. This suggests that these lists deviate from the intended meta-analysis format.
- Meta, Kimi , and Qwen have not been able to provide an answer at this stage.
