Aydın Tiryaki

HOW INTELLIGENT IS ARTIFICIAL INTELLIGENCE? – 4: What Does AI Recommend?

Aydın Tiryaki (2025)

YouTube EN

After three articles titled “How Intelligent is Artificial Intelligence?” describing experiments and their results measuring the understanding and reasoning capabilities of artificial intelligence, the search for topics and methods for a new experiment “why shouldn’t artificial intelligence do this?” was adopted, leading to a new experiment. In this article, you’ll find the topics and methods suggested by artificial intelligence models.

PROMPT

Article 1: https://aydintiryaki.org/2025/11/25/how-intelligent-is-artificial-intelligence-1-how-many-days-calculated/

Article 2: https://aydintiryaki.org/2025/11/26/how-intelligent-is-artificial-intelligence-1-which-day-was-the-conquest-of-istanbul/

Article 3: https://aydintiryaki.org/2025/11/28/how-intelligent-is-artificial-intelligence-3-a-pool-problem/

I’ve examined artificial intelligence in these three published articles. In addition to these articles, if I were to write another article titled “How smart is artificial intelligence?”, could you use your creativity as an artificial intelligence expert to suggest a topic and method for me? The inputs for the topic and method you propose should not be too complex, and the outputs should be assessable and/or measurable.”

The articles listed in the prompt were in English, and the prompt text was prepared in Turkish.

WHAT DID ARTIFICIAL INTELLIGENCE SUGGEST?

When the prompt was presented to 10 artificial intelligence models, different suggestions came up.

Gemini explains, ” I can suggest a topic and method for testing creativity. This test will focus on measuring not only the AI’s performance in logical and computational tasks, but also its ability to understand context, make associations, and generate unique ideas. ” His proposed method is, ” The AI ​​is given two randomly chosen concrete or abstract nouns and asked to create a short, meaningful, and creative text (a short story, a poem, or the beginning of an essay) using these two concepts together. ” He explains that this can be done using the concepts of “Monday syndrome” and “An old lantern” as examples.

While Gemini’s suggestion is interesting and creative, and Gemini suggests methods for measuring its results, evaluation and measurement do not appear straightforward.

ChatGPT offered a numerical suggestion, as seen in three articles it reviewed: “ ‘A study measuring AI’s multi-step reasoning ability: The Cascade Logic Chain Problem .’ This method provides a test that clearly demonstrates AI’s tendency to make errors as logic chains lengthen, rather than providing a single-step correct answer. Furthermore, inputs are controllable, and outputs are measurable.” For example, “A number is given. Transformations are applied to this number in 4–10 consecutive steps. The AI ​​is asked to find the final value by following each step individually.” The reason for this suggestion is that it is easily measurable.

The ChatGPT proposal is feasible, but its originality is debatable.

Grok suggests we can test AIs by posing riddles: “ Logic riddles, on the other hand, are considered a true ‘intelligence indicator,’ and are an area where LLMs still frequently fail. The input is very simple: a story of just a few sentences. The output is 100% measurable: either they will come up with the right answer or they will say something false/contradictory. ” In another inquiry, Grok suggested solving mystery detective stories.

These suggestions from Grok are applicable.

Deepseek suggests testing shapes: “ To measure the ability of artificial intelligences to transform visual information into mathematical problems, pattern recognition, and multi-step reasoning. ” Referring to three articles he reviewed to explain why this topic is a good choice, he says: “ It completes the chain of mathematical logic (article 1) + historical logic (article 2) + practical problem (article 3). It combines visual and mathematical intelligence: It tests the areas where contemporary AIs are most ambitious. It is controllable: Results can be evaluated clearly as true/false. It is scalable: Tests can be prepared at different levels, from simple to complex. ”

To apply the Deepseek proposal, it is necessary to formulate problems with shapes and make the results measurable.

Copilot recommends simple logical inference: “ Testing how consistent an AI is at simple logical inference. That is, measuring whether it can draw the correct conclusion given a few short premises .”

Copilot not only read the three articles given, but also studied the author and, realizing that he wrote a lot about Inebolu, created an example for his suggestion: “ For example: ‘Inebolu is on the Black Sea coast.’ / ‘Cities on the Black Sea coast are adjacent to the sea.’ / ‘Fishing is done in cities adjacent to the sea.’ / Question: ‘Is fishing done in Inebolu? ‘”

Copilot’s simple logic approach makes you think, “I can’t fool AIs with this.” It’s worth trying, but will it yield results??

Perplexity also suggests simple questions from everyday life: “ You can test how accurately and consistently AIs can make simple everyday decisions. For example, the user is presented with a series of small decision scenarios .”

In Perplexity’s example, how successful would questions like, “If it’s raining, should I take an umbrella or a raincoat?” be? These tests suggested 10-15 simple decisions for each AI. This would be a very long experiment.

Claude recommends logic puzzles. Logic Puzzle Consistency Test – Measure how AIs perform when presented with the same logic problems in different ways. Inputs (Simple and Measurable). Choose 3-4 classic logic puzzles. Sample puzzles: Bridge and Flashlight Problem: Four people, a bridge, two people can cross at the same time, flashlight .

Claude’s suggestion can be tested with interesting logic puzzles. For this, traditional puzzles can be explored.

Meta offers a very brief suggestion: “ Testing AI’s everyday problem-solving skills. Choose a few simple logic problems or short pieces of code, give them to the AI, and record the correct answers and solution times .”

Meta’s response, as if to gloss over it, asks us to look at their search tools.

Qwen proposes testing cause-effect relationships and gives the title “ Can It Establish a Cause-Effect Chain in a Simple Story? ” He explains the method as “ testing the ability of artificial intelligence to establish cause-effect relationships between given events in a story in the correct order and to predict the missing link. ”

Qwen suggests tests like guessing the missing sentences between two sentences. It seems like a reasonable method.

Kimi suggested testing the AI’s knowledge of the Turkish alphabet: “ Assign a numerical value to each letter using the Turkish alphabet (…) Let the value of a word be the sum of the numerical values ​​of its letters. ” They wanted to measure the AI’s knowledge of such questions and letters specific to Turkish in the Turkish letter order.

This suggestion from Kimi is worth a try. Their success in sorting word lists containing Turkish-specific letters (like ş, ğ, ı) can be measured.

CONCLUSION

In summary, Gemini offers different concepts; ChatGPT logic chains; Grok riddle stories; Deepseek shapes; Copilot simple logic; Perplexity simple daily decisions; Claude logic puzzle; Meta daily problem; Qwen missing link; and Kimi suggests methods involving alphabets.

When suggesting topics and methods, AI models create these suggestions because they know their areas of expertise—in other words, as students put it, “ask questions from familiar sources.” They often suggest simple questions or simple logical inferences.

Although most of these were easy-to-answer suggestions, it was a study that helped generate ideas for applicable methods.

Aydın Tiryaki
Ankara, November 28, 2025

Aydın'ın dağarcığı

Hakkında

Aydın’ın Dağarcığı’na hoş geldiniz. Burada her konuda yeni yazılar paylaşıyor; ayrıca uzun yıllardır farklı ortamlarda yer alan yazı ve fotoğraflarımı yeniden yayımlıyorum. Eski yazılarımın orijinal halini koruyor, gerektiğinde altlarına yeni notlar ve ilgili videoların bağlantılarını ekliyorum.
Aydın Tiryaki

Ara