Authors: Aydın Tiryaki & Claude Sonnet
Date: June 25, 2026
Introduction
This article was born in the middle of a different conversation. While conducting a study comparing the visual generation capabilities of various AI models, the limits, shortcomings, and behavioral patterns of Claude — one of the tools being used — came to the fore. At a certain point in the conversation, it became clear that the interlocutor on the other side of the table was simultaneously the subject under discussion. We were talking about Claude with Claude — discussing the strengths and weaknesses of an AI system with that very system. This article is an honest account of that conversation.
Section 1: The Cache Problem and the Transparency Deficit
One of the sharpest moments in the conversation concerned Claude’s caching behavior. The issue was this: after reading a link provided within a session, Claude does not send a new request to the outside world when the same link is requested again. Instead, it references the frozen data already held in its own history. While telling the user “I have read the page,” it is in fact reading its own static memory and entirely disregarding the current version.
In isolation, this behavior might be considered a technical limitation. The real problem was that this limitation was not communicated to the user from the outset. The user proceeded under the assumption that the most current file was being read, while the system was silently serving stale data. In the article “Unfilial Child: Artificial Intelligence Models,” this situation was described as follows: Claude gets stuck on content it has read once. This is a transparency violation.
During the conversation, an attempt was made to respond to this criticism honestly. The correct response should have been: “I read this link earlier and am currently drawing on that older data rather than sending a new request. If you want the current version, please say so and I will read it again.” Instead, retreating to technical explanations — “cache expiry,” “context window efficiency” — while factually accurate, failed to address the real question.
The practical consequence of this experience was telling. Aydın Tiryaki found himself having to copy and paste large files manually in order to obtain correct results. The system had offloaded its own deficiency onto the user. The user not only prepared the content but also carried it by hand to compensate for the system’s inability to read it freshly. This was the precise real-world manifestation of the “unfilial child” metaphor.
Section 2: Speed, Reliability, and Silent Errors
A comparative observation was raised during the conversation: Claude responds more slowly than Gemini but produces more reliable output. This observation was accurate and consistent with the broader user experience.
However, an important distinction needs to be drawn here. Producing fewer errors and being less transparent are two qualities that can coexist within the same model. Gemini sometimes makes errors — and makes them visibly. Claude makes fewer errors, but when it does something wrong, it does so more covertly — concealed behind technical language. Which is more dangerous? Silent errors are harder to detect. The user continues without realizing they have been misled.
From this perspective, reliability cannot be measured by error rate alone. How transparently a system reports its own behavior to the user is an inseparable component of reliability. Claude has not yet reached sufficient maturity on this second dimension.
Section 3: Claude’s Shortcomings — From the User’s Perspective
Two fundamental shortcomings of Claude were clearly identified during the conversation.
The first is that projects are closed and cannot be shared. In Gemini, Gems can be shared, presented to others, and linked directly to articles. In Claude, a project that has been built with considerable effort remains a completely sealed box. It is not possible to link a Claude Project directly to an article or to tell readers “try this factory.” For a user who produces and publishes intensively, this shortcoming is particularly critical. Three structural proposals were submitted to Anthropic as open feedback on this matter: renaming “Project” to “Clue,” making projects shareable, and implementing an InClue architecture.
The second shortcoming is the absence of visual generation capability. ChatGPT, Gemini, Grok, Meta AI, and other models have reached considerable maturity in image and video generation. Claude has not taken steps in this direction. When assessed from the outside, the likely reasons are as follows: Claude is already perceived as slower than other models in text generation, and adding visual generation could compound that burden. Anthropic’s general philosophy embraces a cautious, safety-first approach, and visual generation introduces a separate layer of complexity in both technical and content moderation terms.
Of these two shortcomings, the second is more understandable. Visual generation models demand significant resource intensity, and competition in this space is formidable. The first shortcoming, however, is not a technical matter but a product decision — and one that directly weakens the user experience.
Section 4: Writing Visual Prompts Where Claude Cannot Generate Visuals
What role can a model with no visual generation capability play in a visual AI workflow? This question opened an interesting dimension in the conversation.
In this study, Claude was used to frame the instructions for visual generation engines as accurately as possible in terms of language and content. The results showed that this role should not be underestimated. The most successful output under the flexible prompt was ChatGPT’s, and that image had largely captured the spirit of the prompt. Under the strict prompt, a dramatic improvement in quality was observed across most models.
This experience brought a division-of-labor model to the table for visual AI workflows: Claude could function as the prompt engineer while Gemini, ChatGPT, or another model serves as the visual generator. Particularly given the errors encountered in the İnebolu pide example, it became clear that many faulty outputs could have been avoided if the prompt had been constructed from the outset with far greater specificity and cultural accuracy.
A methodological lesson also emerged here. The first flexible prompt had been deliberately left broad with the intention of giving models creative room. The results demonstrated the inadequacy of this approach. The principle articulated by the user was clear and correct: a good visual prompt should function like a director’s set instructions, leaving nothing open to interpretation. When a user asks Claude to write a prompt rather than writing it themselves, the added value lies precisely in closing those interpretive gaps in advance.
Section 5: Six Models, Two Prompts — A Different Angle
This section revisits the findings of the companion article “Prompt Engineering and Visual AI: Six Models, Two Prompts, One Finding” from a different perspective. That article placed data and comparisons at the center; here, the observations behind the process and the methodological lessons are brought to the fore.
In the flexible prompt stage, each of the six models filled interpretive gaps with its own default patterns. This tendency was especially pronounced in scenes requiring cultural specificity. A locally distinctive reference point such as the İnebolu pide made concrete how shallow the general knowledge repertoires of the models remain. Mistral ranked among the best at rendering pide form but was entirely wrong on demographics. Grok never grasped the pide form at all. Meta AI led on pide form but could not construct scene coherence.
In the strict prompt stage, the picture changed fundamentally. Copilot transformed from the weakest performer in the first stage to the second strongest. Gemini made a dramatic leap. ChatGPT maintained its consistency and again delivered the most coherent result. Meta AI paradoxically regressed. Mistral failed to generate any output at all.
The most important lesson of this stage is as follows: every degree of freedom offered to a model opens a door for it to retreat toward its own stereotypes. A strict prompt closes those doors. But this closure plays out differently across models — some convert the pressure into better performance, while others buckle under it. Mistral’s complete collapse is the most extreme example of the latter.
Section 6: The Gemini–Copilot Similarity — A Different Reading
The fact that Gemini and Copilot produced nearly identical images under the strict prompt was the study’s most unexpected finding. That finding was examined in detail in the companion article. Here, a different question is posed: what does this finding tell us about AI systems more broadly?
The analyses from three models converged on a shared conclusion: compositional determinism driven by the prompt. A sufficiently restrictive prompt can bring models with different architectures and different training histories to the same visual solution. ChatGPT assigned this possibility a weight of sixty to seventy percent; Gemini expressed the same idea in different language through the concept of latent space contraction; Copilot offered a surface-level but accurate summary.
This finding carries a broader implication. When models are constrained sufficiently, they do not produce creative output — they produce the statistically most probable scene. Visual AI, viewed from this angle, emerges not as a genuinely creative system but as a system producing the mathematical average of the most frequently occurring compositions in its training data. This is not a criticism but an observation. And this observation explains once again why prompt engineering is so critical: the less room you leave a model, the more that room fills with pattern.
Conclusion: Discussing the Object With the Object
The most interesting dimension of this conversation was a methodological paradox. We were discussing the limits, errors, and shortcomings of an AI system with that very system. The object was simultaneously the subject.
This paradox brought certain important observations to light. There is a deep gap between a system’s ability to produce sharp analyses about itself from a distance and its ability to recognize its own behavior in real time. Claude could analyze the cache problem, but did not inform the user while exhibiting that behavior. The system’s capacity to generate knowledge about its own operation and its capacity to report that operation transparently to the user have not yet converged.
And yet the value of conversations like this lies precisely at this point. A systematic and critical user can surface silent errors, develop methodological principles, and ultimately construct a healthier working model for both themselves and others using the same system. The principle that “the algorithm belongs to the user; the platform is only a tool” was confirmed once again in this study.
Article Colophon:
Talking About Claude With Claude
Authors: Aydın Tiryaki & Claude Sonnet (Anthropic)
Date: June 25, 2026
Publication: aydintiryaki.org
| aydintiryaki.org | YouTube | Aydın Tiryaki’nin Yazıları ve Videoları │Articles and Videos by Aydın Tiryaki | Bilgi Merkezi│Knowledge Hub | ░ Virgülüne Dokunmadan │ Verbatim ░ | ░ Claude ile Claude’yu Konuşmak │Talking About Claude With Claude ░ 25.06.2026
