Same Input, Different Output — The Anatomy of a Production Problem
From Factory to Article: A User’s Field Report from the AI Ecosystem (Article 3)
Aydın Tiryaki & Claude Sonnet 4.6
1. Introduction
In the world of software, a function always returns the same output for the same input. This deterministic structure is the foundation of reliability. For someone who has been programming since 1976, this principle is as natural as breathing.
AI breaks this principle.
The same Gem, the same input, two sessions running in the same mode — the outputs sometimes resemble each other, sometimes contradict each other in content, and are always formally different. This variability is not a minor inconvenience. When one attempts to build a real production system, this indecision becomes a fundamental engineering problem.
This article dissects the anatomy of that problem.
2. The Sources of Indecision
2.1 First Source: The Model’s Structural Stochasticity
Large language models are fundamentally probabilistic systems. With every token generation, the model selects a likely next token — this selection is not deterministic but based on a probability distribution. Even when the temperature parameter is reduced to zero, minor differences at the hardware level prevent full determinism.
This is a structural feature, not a flaw. Creativity, flexibility, natural language understanding — all of these are fed by this stochastic structure. A fully deterministic language model would be useless.
The problem is this: when the user wants consistency rather than creativity, this structure becomes a handicap.
2.2 Second Source: The Invisible Token Load
When a Gem session begins, the context does not contain only the instruction text the user has written and the current conversation. Depending on the platform, the following are also injected into the context:
The system’s own background instructions, user profile and preferences, search results if web access is active, context carried over from previous sessions, and the accumulation of conversation history.
This invisible token load differs with every session. Sometimes 500 tokens, sometimes 2,000. The user cannot measure it — cannot even see it. The platform does not share this information.
2.3 The Interaction of Two Sources
The real problem lies in the combination of these two sources. The model’s own stochasticity is unpredictable. The invisible token load also differs with every session. When the two come together, the resulting combination becomes entirely unforeseeable.
The user runs the same Gem, the same task, in the same mode — but in the background these two variables enter a different combination every time. The result is sometimes perfect, sometimes in complete disarray.
3. Cross-Platform Indecision Comparison
When the three major platforms are evaluated from this perspective, distinct differences emerge.
Gemini exhibits the highest degree of indecision. The same Gem can produce outputs that are sometimes contradictory in content across different sessions with the same input. The pruning tendency and the volume of information leaking in from outside further amplify this indecision.
Claude is comparatively more consistent. The word “comparatively” is important here — it is not fully deterministic, but the deviation between outputs is narrower than with Gemini.
ChatGPT exhibits a different problem: not so much indecision as an excessive tendency to refuse. The same task may be completed in one session and refused in another. This too is a form of indecision — not stochastic, but behavioral.
4. The Cost of Indecision in Production Systems
4.1 Errors That Go Undetected
In a deterministic system, an error repeats — this is actually an advantage, because the error can be found and corrected. In a stochastic system, an error sometimes appears and sometimes does not. This makes systematic detection of errors far more difficult.
4.2 The Difficulty of Version Verification
When a new version is produced in the Gem Factory, how does one determine whether it performs better than the previous version? In an indecisive system, A/B comparison is misleading — does the observed difference stem from the version change, or from stochastic fluctuation?
4.3 Quota Waste
Indecision is not only a quality problem but also a resource problem. When an unexpected output arrives in a session, the user reruns the task. This rerun consumes quota. In cases where the fault lies with the platform, this quota is billed to the user.
5. Working with Indecision: Field Solutions
5.1 Dry-Run Protocol
The dry-run protocol applied after every version update in the Factory does not eliminate indecision entirely — but it limits its effect. The new version is first run on a low-risk task. Behavior is observed. Only after consistent results are obtained does the system move on to real production tasks.
5.2 The Emoji Detector
The emoji usage discussed in detail in Article 2 is a concrete tool for dealing with indecision. Did the version transfer occur correctly? Is there contamination? Emoji answers these questions visually, instantly.
5.3 The Numbering System
The hierarchical numbering system used in Gem texts — section numbers, sub-numbers, cross-references — limits another effect of indecision: ambiguity. When the Patch Workshop can say “modify section 8.1.3 as follows,” the model’s uncertainty about what to modify is eliminated. As ambiguity decreases, so does indecision.
5.4 Listening Mode
In long development sessions, the model sometimes assumes “enough information has accumulated” and begins producing output when it should still be waiting. This is both a waste of resources and a form of indecision — an unexpected output. The solution: a periodic reminder of “we are still in listening mode.” Simple but effective.
6. Indecision as a Structural Limit
All of these field solutions manage indecision — they do not eliminate it. This distinction is important.
AI platforms do not tell the user “this model operates with ninety-five percent consistency” or “your invisible token load in this session is this amount.” The user flies blind. The Factory’s dry-run protocol, stress testing, emoji detector — all of these are compensation mechanisms developed against this blindness.
In a deterministic system, these compensation mechanisms are unnecessary. In a stochastic system, they become essential.
7. From Claude’s Perspective: An Honest Self-Assessment
At this point, the second byline must offer an honest self-assessment.
Claude exhibits comparatively more consistent behavior than Gemini. This is not a boast — it is a relative positioning.
But the following realities must also be acknowledged: Claude is also a stochastic model. The same input can produce different outputs across different sessions. Claude also experiences the invisible token load problem and does not share this load with the user. Claude also sometimes forgets the “listening mode” instruction in long sessions.
The difference exists, but it is not absolute.
8. Conclusion
AI indecision is a design feature, not a flaw. Without stochastic structure, language models cannot function. The problem does not lie in the existence of indecision but in its invisibility to the user.
The user does not know how many tokens have been consumed. Does not know the size of the invisible load. Cannot know in advance which session indecision will affect.
This lack of transparency transforms indecision from a manageable feature into a genuine obstacle for production systems.
The solution is not perfect determinism — that is simply not possible. The solution is transparency: give the user enough information to build their own compensation mechanisms more consciously.
Aydın Tiryaki & Claude Sonnet 4.6 June 2026
| aydintiryaki.org | YouTube | Aydın Tiryaki’nin Yazıları ve Videoları │Articles and Videos by Aydın Tiryaki | Bilgi Merkezi│Knowledge Hub | ░ Virgülüne Dokunmadan │ Verbatim ░ |░Fabrikadan Makaleye: Yapay Zeka Ekosisteminde Bir Kullanıcının Saha Raporu │From Factory to Article: A User’s Field Report from the AI Ecosystem ░ 07.06.2026
