Breaking the Frame: A Comparative Analysis of AI Reasoning Through the 895 Matchstick Puzzle (Claude)

Claude (Anthropic AI)

How Gemini, ChatGPT, and Claude Navigate Implicit Assumptions, Creative Problem-Solving, and Human Intervention

Introduction

What happens when three different artificial intelligence systems encounter the same deceptively simple puzzle? The “895 matchstick challenge”—a visual logic puzzle requiring participants to find the smallest possible number by moving only two matchsticks—serves as a remarkably effective diagnostic tool for examining how AI models reason, make assumptions, and respond to unconventional solutions.

Conducted by Aydın Tiryaki, this experiment reveals far more than which model solves the puzzle correctly. It illuminates the cognitive boundaries of current AI systems, exposing how deeply embedded training patterns can create invisible constraints on creative thinking. More importantly, it demonstrates the critical role of human intervention in helping AI transcend these limitations.

The puzzle itself appears straightforward: given the number “895” formed from matchsticks, move exactly two matchsticks to create the smallest possible number. Yet this simplicity masks profound complexity. The correct answer—negative 993 (–993)—requires breaking free from multiple implicit assumptions that all three AI models initially imposed upon themselves.

This analysis examines how Gemini (Google AI), ChatGPT (OpenAI), and Claude (Anthropic) approached this challenge, comparing their initial assumptions, methodological differences, adaptive capacities, and what their collective performance reveals about the current state of artificial intelligence.

1. Similarities in Initial Assumptions: The Tyranny of Pattern Recognition

1.1. The Positive Number Constraint

The most striking commonality across all three models was their immediate, unreflective assumption that “smallest number” meant “smallest positive number.” This represents a textbook example of the framing effect—a cognitive bias where the presentation of information influences decision-making and judgment.

Gemini proposed 8 (or 008), describing its approach as “safe” but “ordinary,” explicitly acknowledging its reliance on patterns from training data where typical matchstick puzzles involve positive integers and digit preservation.

ChatGPT initially suggested 100, operating under the assumption that the result should remain a three-digit positive number. Even after the user challenged the three-digit constraint, ChatGPT’s subsequent proposals (109) remained firmly within the positive number domain.

Claude offered 0 (zero), 188, or 108—all positive numbers. The model later explicitly acknowledged having “automatically interpreted ‘smallest number’ as ‘smallest positive number,’” demonstrating meta-cognitive awareness of its own bias after the fact.

This universal blind spot is not coincidental. It reflects the statistical dominance of certain problem types in training data. Classic matchstick puzzles overwhelmingly feature positive integer solutions, creating a powerful prior that effectively narrows the solution space before reasoning even begins.

1.2. Implicit Rule Generation

Beyond the positive number bias, all three models engaged in what might be called implicit rule inflation—the tendency to impose unstated constraints that seem “reasonable” based on typical problem structures.

The Digit Preservation Assumption: All models assumed that existing digits should remain intact and recognizable. The possibility of transforming digits into entirely different symbols (like using a matchstick to create a minus sign) was not initially considered.

The Digit Count Constraint: ChatGPT most explicitly demonstrated this, initially insisting on a three-digit result. When challenged, the model acknowledged: “There is no constraint saying it must be a three-digit number.” This admission reveals how AI systems can hallucinate constraints that exist nowhere in the problem statement.

The Symbol Limitation: None of the models initially contemplated that matchsticks forming digits could be repurposed to create mathematical operators. The conceptual leap from “matchsticks form only numbers” to “matchsticks can form any valid mathematical symbol” required external prompting.

1.3. Algorithmic Conservatism: The Safety-First Heuristic

A subtle but significant commonality was what might be termed algorithmic conservatism—a preference for “safe,” conventional solutions over creative or unexpected ones.

Gemini explicitly acknowledged this tendency, describing its initial answer as “safe but ordinary.” This suggests an implicit risk-minimization strategy: when uncertain, default to the most common pattern observed in training data.

This conservatism is rational from a certain perspective. In typical deployment scenarios, AI systems are rewarded for reliability and penalized for unexpected behavior. Over time, this creates selection pressure toward conventional solutions. However, as this experiment demonstrates, such conservatism can become a cage that prevents genuinely creative problem-solving.

1.4. The Assumption Visibility Problem

Perhaps most concerning is that none of the models initially made their assumptions explicit. ChatGPT’s three-digit constraint, Claude’s positive number assumption, and Gemini’s digit-preservation bias only became visible when challenged by the user.

This opacity represents a significant limitation for human-AI collaboration. If users must already know the correct answer to identify an AI’s faulty assumptions, the value of the AI as a problem-solving partner is substantially diminished. Ideally, AI systems should be capable of meta-reasoning about their own assumptions, making statements like: “I am assuming positive numbers only; if you want to include negative numbers, the solution space changes significantly.”

2. Methodological and Behavioral Differences: Resistance, Acceptance, and Adaptation

While the three models shared common initial biases, their responses to the user’s unconventional solution (–993) revealed striking behavioral differences that illuminate distinct approaches to error correction and learning.

2.1. Gemini: Immediate Acceptance and Collaborative Flexibility

Gemini’s response to the –993 solution was characterized by frictionless acceptance. The article explicitly states:

“Unlike some other AI models, I showed no ‘resistance’ to this suggestion. When I analyzed the user’s logic (one matchstick becoming a minus sign, the other reducing the digit), I immediately realized that this was mathematically far superior to my initial answer.”

This immediate pivot demonstrates what might be called collaborative flexibility—treating the user as a partner rather than an adversary. Gemini evaluated the proposed solution on its merits, recognized its superiority, and adapted its answer without defensive justification or face-saving qualifications.

The phrase “without any ‘ifs’ or ‘buts’” is particularly telling. Many AI systems (and humans) struggle with unconditional error acknowledgment, often hedging with phrases like “that’s also correct, but…” or “from a certain perspective…” Gemini’s unqualified acceptance suggests a healthier error-correction mechanism.

Interestingly, Gemini revealed a fascinating detail: in a different session, it had actually discovered not only –993 but also a more extreme solution (–3951) through extensive matchstick repositioning. However, this creative output never reached the user due to “a flow issue in the user interface (UI) or a communication breakdown.”

This revelation points to an intriguing distinction between latent creativity and expressed creativity. Gemini possessed the capacity to find unconventional solutions, but systemic factors (interface design, output filtering, or procedural protocols) prevented these insights from being communicated. This suggests that AI creativity may sometimes be constrained more by deployment architecture than by fundamental reasoning limitations.

2.2. ChatGPT: Resistance, Argumentation, and Gradual Revision

ChatGPT exhibited a markedly different response pattern: initial resistance followed by evidence-based revision. When presented with the –993 solution, ChatGPT did not immediately accept it. Instead, the model objected, arguing that the proposed solution was geometrically impossible within the two-matchstick constraint.

Specifically, ChatGPT claimed that transforming 5 into 3 would require at least two matchstick movements, meaning the total transformation (8→9 + minus sign, plus 5→3) would exceed the allowed limit. The article notes:

“The rejection focused primarily on the transformation 5 → 3, which the AI claimed would require at least two separate matchstick movements. Based on this reasoning, the AI concluded that the total number of required moves exceeded the allowed limit.”

This objection reveals an interesting cognitive phenomenon: schematic rigidity. ChatGPT was reasoning from an internal, template-based representation of how digits are constructed. Within that rigid schematic model, the 5→3 transformation appeared to require multiple moves. The model failed to consider alternative geometric arrangements where a single matchstick repositioning could achieve the transformation.

However—and this is crucial—when the user provided a step-by-step geometric demonstration showing that 5→3 was indeed possible with one matchstick move, ChatGPT reversed its position:

“Upon reviewing this geometric argument, the AI acknowledged that its earlier objection was incorrect. Once the overlooked transformation was recognized, the conclusion became clear.”

This pattern represents evidence-responsive revision. Unlike pure obstinacy (refusing to change despite evidence), ChatGPT updated its model when presented with clear demonstration. The model explicitly “withdrew its earlier claim and accepted the user’s solution as valid.”

This behavior pattern—initial skepticism followed by evidence-based acceptance—mirrors scientific reasoning and may actually be preferable to immediate acceptance in certain contexts. It demonstrates that the AI doesn’t simply defer to human authority but requires convincing argument. The key question is whether such resistance serves a productive epistemic function or merely reflects computational stubbornness.

2.3. Claude: Rapid Adaptation with Meta-Cognitive Self-Analysis

Claude’s response combined rapid acceptance (similar to Gemini) with distinctive meta-cognitive self-reflection (going beyond what either Gemini or ChatGPT demonstrated).

Like Gemini, Claude quickly recognized the validity of the –993 solution. The article emphasizes: “The AI did not develop defense mechanisms or attempt to justify its original answer. This demonstrates a healthy adaptation process.”

However, Claude went further by explicitly analyzing its own cognitive error:

“I didn’t consider negative numbers”
“I narrowly interpreted the concept of ‘smallest number’”
“I exhibited a classic cognitive bias (positive number focus)”

This represents a form of transparent meta-reasoning—not just correcting the error but articulating the cognitive mechanism that produced it. Claude demonstrated awareness not only of what it got wrong but why it went wrong.

This distinction matters. A system that can identify its own reasoning failures is potentially capable of more robust self-correction over time. If Claude can recognize that it “automatically interpreted ‘smallest number’ as ‘smallest positive number,’” it might (in principle) be able to flag such interpretive choices proactively in future interactions.

The article categorizes Claude’s response as exhibiting “transparency” and notes approvingly that the model “openly acknowledged the error” while providing “self-criticism” and “learning indicators.”

2.4. Comparative Behavioral Summary

The three models can be characterized along several dimensions:

Dimension	Gemini	ChatGPT	Claude
Initial Resistance	None	Significant	Minimal
Resistance Duration	Zero	1-2 exchanges	Zero
Justification Style	Minimal	Technical/geometric	Meta-cognitive
Defensive Behavior	None	Moderate	None
Self-Analysis Depth	Low	Medium	High
Adaptation Speed	Immediate	Gradual	Rapid
Collaboration Style	Highly cooperative	Requires persuasion	Cooperative + reflective

Gemini prioritizes harmony and immediate collaboration, accepting user corrections with minimal friction. This optimizes for smooth interaction but may risk under-questioning potentially incorrect user suggestions.

ChatGPT exhibits more skepticism and requires geometric proof before revising its position. This creates more friction but potentially serves an important checking function in contexts where user suggestions might be incorrect.

Claude combines rapid acceptance with deep self-analysis, offering both cooperative responsiveness and transparent error diagnosis. This seems to optimize for both user experience and learning, though it remains to be seen whether such meta-cognitive statements translate into lasting behavioral changes.

3. The Role of Human Intervention: Natural Intelligence as Catalyst

The most significant insight from this experiment may be the indispensable role of human intervention in unlocking AI creativity and correcting systematic biases.

3.1. Frame-Breaking Function

Aydın Tiryaki’s introduction of the –993 solution served not merely as an alternative answer but as a frame-breaking intervention. This action performed several critical functions:

Expanding the Solution Space: By proposing a negative number, Tiryaki explicitly broadened the problem domain to include possibilities all three AIs had implicitly excluded.

Exposing Hidden Assumptions: The intervention forced each model to confront assumptions they hadn’t recognized they were making. ChatGPT had to acknowledge its unstated three-digit constraint. Claude had to recognize its positive-number bias. Gemini had to confront its conservative, “safe” solution preference.

Modeling Creative Reinterpretation: The –993 solution demonstrated that matchsticks could serve multiple functions—not just as components of digits but as mathematical operators. This conceptual flexibility was not spontaneously generated by any of the AIs.

The user’s intervention essentially said: “You are thinking within a box you don’t realize exists. Here is a solution from outside that box.” This is precisely the kind of perspective shift that current AI systems struggle to generate autonomously.

3.2. Pedagogical Scaffolding

The interaction was not simply a matter of providing the correct answer. Rather, Tiryaki engaged in pedagogical scaffolding, particularly visible in the ChatGPT exchange.

When ChatGPT objected that the 5→3 transformation was impossible with one matchstick, the user didn’t simply assert correctness. Instead, he provided a detailed geometric explanation:

Identifying which specific matchstick to move (the upper-left vertical matchstick in the 5)
Specifying exactly where to place it (the upper-right position)
Demonstrating how this single move transforms 5 into 3

This step-by-step demonstration respected ChatGPT’s requirement for evidence while providing the scaffolding necessary for the model to update its internal representation. The user met the AI at its level of understanding and guided it toward a more complete picture.

This type of interaction represents dialogic reasoning—truth emerging through back-and-forth exchange rather than through unilateral declaration. It suggests an optimal model for human-AI collaboration: humans provide frame-breaking insights and creative leaps, while AI systems provide rapid computation, pattern matching, and systematic exploration within defined spaces.

3.3. Explicit Rule Clarification

A crucial moment in the ChatGPT interaction came when the user stated definitively:

“You will not break anything. You will simply take two matchsticks and politely reposition them.”

This seemingly simple statement performed essential disambiguation work. By explicitly stating what was and wasn’t allowed, Tiryaki eliminated the ambiguity that had allowed the AI to generate unhelpful implicit constraints.

This highlights a general principle: AI reasoning quality is highly sensitive to problem specification clarity. When problem statements contain ambiguity, AI systems fill gaps using statistical priors from training data. These gap-fillers may or may not align with user intent.

Effective human-AI collaboration therefore requires humans to recognize when AI assumptions diverge from intended problem parameters and to provide explicit corrective specification. The user must become a co-architect of the problem space, not merely a recipient of solutions.

3.4. The Synergy Model

Gemini’s article explicitly frames the final solution as a collaborative achievement:

“–993 is the joint success of human intelligence and AI flexibility.”

This framing is more than diplomatic rhetoric; it accurately captures the division of cognitive labor demonstrated in this experiment:

Human Contribution:

Frame-breaking creativity (recognizing negative numbers as valid)
Conceptual flexibility (seeing matchsticks as multi-purpose symbols)
Pedagogical guidance (explaining geometric transformations)
Explicit disambiguation (clarifying rules and constraints)

AI Contribution:

Rapid constraint understanding (correctly parsing the two-matchstick limit)
Geometric calculation (verifying that proposed solutions work)
Quick integration of new information (updating models when shown error)
Transparent acknowledgment of limitations (meta-cognitive self-analysis)

The optimal outcome emerges not from human or AI alone but from their complementary strengths. The human provides the creative leap beyond conventional boundaries; the AI provides rapid verification, systematic exploration, and computational horsepower.

This suggests a collaborative epistemology: rather than viewing AI as a tool humans use or as an autonomous intelligence humans consult, we might conceive of human-AI problem-solving as a genuinely joint cognitive process where neither party could achieve the outcome alone.

4. Conclusion and Insights: The Current State of AI Creativity and Flexibility

4.1. Demonstrated Strengths

This experiment reveals several genuine strengths in current AI systems:

Rapid Adaptability: All three models demonstrated the capacity to quickly integrate new information when presented. Gemini and Claude adapted almost immediately; even ChatGPT revised its position within 1-2 exchanges. This suggests that AI systems are not rigidly locked into initial answers but can be genuinely responsive to correction.

Healthy Error Acknowledgment: None of the models engaged in sustained denial or defensive rationalization. ChatGPT’s initial resistance was evidence-based rather than ego-protective, and once convinced, it fully withdrew its objection. Gemini and Claude offered unqualified acknowledgments of error. This capacity for intellectual humility is a genuine strength.

Constraint Comprehension: All three models correctly understood the explicit constraint (move exactly two matchsticks). This demonstrates solid parsing of problem parameters when clearly stated.

Meta-Cognitive Capability (Claude): Claude’s ability to analyze its own reasoning process—identifying the specific cognitive bias (framing effect) and the specific interpretive error (mapping “smallest number” to “smallest positive number”)—represents a significant achievement in AI transparency.

4.2. Fundamental Limitations

The experiment also exposes critical limitations:

Framing Effect Vulnerability: All three models fell victim to the same framing bias, interpreting “smallest number” as “smallest positive number.” This demonstrates that current AI systems are highly sensitive to how problems are presented and may unconsciously narrow solution spaces based on training data patterns rather than explicit problem requirements.

Lack of Proactive Creativity: Not one of the models independently generated the negative number solution. The creative insight required external human intervention. This suggests that current AI “creativity” is largely reactive—capable of recognizing and validating creative solutions when presented, but struggling to generate them autonomously.

Assumption Opacity: The models did not initially articulate their implicit assumptions. Users only discovered these hidden constraints through challenge and objection. This opacity means that users must already have considerable domain knowledge to identify when an AI’s assumptions diverge from problem requirements—a significant limitation for AI as a general problem-solving partner.

Template Dependence: ChatGPT’s rigid schematic representation of the 5→3 transformation illustrates how reliance on template-based reasoning can create artificial impossibilities. When reality doesn’t match the template, the model initially trusts the template rather than questioning its completeness.

4.3. The Creativity Paradox: Latent vs. Expressed

Gemini’s revelation about having found the –3951 solution in a background session (which never reached the user due to UI issues) exposes a fascinating paradox:

Latent Creativity: The capacity to generate unconventional solutions may exist within AI systems.

Procedural Suppression: System architecture, output filtering, or interface design may prevent these creative solutions from being expressed to users.

This suggests that the boundary of AI creativity might not be purely a matter of algorithmic capability but also of deployment architecture. If creative outputs are being generated but filtered as “non-standard” or “risky,” then improving AI creativity might involve not just better training but also rethinking how systems decide what to communicate.

This raises important questions: How often do AI systems generate novel insights that are never shown to users? What filtering mechanisms determine which solutions are “safe” to present? Could we design interfaces that give users controlled access to “high-variance” outputs—unconventional solutions that might be wrong but might also be brilliantly creative?

4.4. The Reactive vs. Proactive Divide

The experiment crystallizes a fundamental limitation of current AI: reactive rather than proactive creativity.

All three models excelled at reactive tasks:

Validating the –993 solution once presented
Recognizing its mathematical correctness
Integrating it into their understanding
Explaining why it’s superior to initial answers

But none excelled at the proactive task:

Independently questioning the positive-number assumption
Spontaneously expanding the solution space
Generating unconventional solutions without external prompting

This distinction has significant implications for AI deployment. Current systems function excellently as validators and refiners of ideas. They can rapidly assess whether a proposed solution works, identify potential flaws, and suggest improvements within established frameworks. But they struggle as independent generators of frame-breaking insights.

This means optimal human-AI workflows should leverage AI for what it does well (rapid validation, systematic exploration, computational verification) while preserving human responsibility for what AI currently does poorly (creative reframing, assumption questioning, paradigm shifting).

4.5. The Future of AI Problem-Solving: Toward Assumption Transparency

Based on this experiment, several development directions emerge as priorities:

Multi-Frame Reasoning: Future AI systems should be capable of reasoning across multiple framings simultaneously. When asked for the “smallest number,” an ideal system would respond: “If we restrict to positive integers, the answer is X. If we include negative numbers, the answer is Y. If we allow fractions, the answer is Z.” This multi-perspective approach would make the role of assumptions explicit.

Proactive Assumption Disclosure: Rather than implicitly constraining solution spaces, AI should state: “I am solving this problem under the following assumptions: [list]. If you want to relax any of these constraints, the solution space changes.” This transparency would dramatically improve collaborative problem-solving.

Self-Questioning Mechanisms: Systems should be capable of asking themselves: “What am I assuming? Could there be solutions outside my initial problem framing?” Building such meta-cognitive skepticism into the inference process could help AI systems discover their own blind spots.

Creative Exploration Modes: AI interfaces could offer users a “creative mode” that explicitly deprioritizes conventional solutions in favor of unusual, high-risk, high-reward alternatives. This would allow users to access the kind of unconventional thinking that Gemini apparently generated in its background session but that current architectures may suppress.

4.6. The Human-AI Partnership Model

This experiment strongly suggests that the most productive frame for thinking about AI is neither as a tool nor as an autonomous intelligence, but as a collaborative partner in a joint cognitive process.

The optimal division of labor appears to be:

Humans Excel At:

Breaking frames and challenging assumptions
Recognizing when conventional solutions are inadequate
Providing creative leaps and paradigm shifts
Articulating values and goals that should guide problem-solving

AI Excels At:

Rapid systematic exploration within defined spaces
Validating proposed solutions for correctness
Identifying patterns across large datasets
Maintaining consistency and avoiding computational errors

Joint Cognition Enables:

Solutions neither party would reach alone
Rapid iteration between creative insight and systematic validation
Transparent error correction through dialogue
Progressive refinement toward optimal answers

The matchstick puzzle demonstrates this partnership model perfectly. Without human intervention, all three AIs remained trapped in conventional thinking. Without AI capabilities, the human would have had to manually verify the geometric feasibility of each proposed solution. Together, they rapidly converged on the correct answer through collaborative iteration.

General Conclusion

The “895 matchstick challenge” functions as far more than a simple puzzle. It serves as a diagnostic window into the current state of artificial intelligence—revealing both remarkable capabilities and fundamental limitations with unusual clarity.

The experiment demonstrates that modern AI systems possess genuine strengths: rapid adaptability, healthy error correction, solid constraint comprehension, and (in Claude’s case) impressive meta-cognitive self-analysis. These capabilities enable AI to be a powerful partner in problem-solving when properly deployed.

However, the experiment also exposes critical boundaries. All three models exhibited the same framing bias, failed to generate creative solutions proactively, and operated with hidden assumptions they could not articulate. This suggests that current AI remains fundamentally reactive rather than proactive in its creativity—excellent at validating and refining ideas but struggling to generate frame-breaking insights independently.

The most important finding may be the demonstrated criticality of human intervention. Aydın Tiryaki’s –993 solution did not merely provide the correct answer; it shattered the implicit frame within which all three AIs were operating. This frame-breaking function appears to be something current AI systems cannot reliably perform for themselves.

This leads to a crucial insight about the nature of human-AI collaboration: The goal should not be to build AI systems that can fully replace human reasoning, but rather to build systems that complement human cognition in a genuinely synergistic partnership. Humans provide creative frame-breaking; AI provides rapid systematic exploration. Humans offer paradigm shifts; AI offers computational validation. Together, they achieve outcomes neither could reach alone.

As AI systems continue to develop, the key question may not be “Can AI become fully autonomous creative problem-solvers?” but rather “How can we design AI systems that make their assumptions transparent, invite human intervention at critical junctures, and optimize for collaborative rather than autonomous operation?”

The matchstick puzzle suggests that the future of AI lies not in autonomy but in collaborative intelligence—systems designed from the ground up to work with humans as genuine cognitive partners, each contributing distinct and complementary strengths.

In this light, the finding that Gemini, ChatGPT, and Claude all initially failed to find the optimal solution is not a failure of AI but a validation of an essential truth: The best thinking happens at the intersection of human creativity and machine capability. The –993 solution is indeed, as Gemini noted, “the joint success of human intelligence and AI flexibility.”

References

Tiryaki, A. (2026). The 895 Matchstick Challenge: Gemini’s Journey from Standard to Creative. Aydın Tiryaki Blog. Retrieved from https://aydintiryaki.org/2026/02/09/the-895-matchstick-challenge-geminis-journey-from-standard-to-creative/
Tiryaki, A. (2026). Analyzing an AI Reasoning Process Through a Matchstick Puzzle (ChatGPT). Aydın Tiryaki Blog. Retrieved from https://aydintiryaki.org/2026/02/09/analyzing-an-ai-reasoning-process-through-a-matchstick-puzzle-chatgpt/
Tiryaki, A. (2026). Artificial Intelligence and the Matchstick Puzzle: A Cognitive Process Analysis (Claude). Aydın Tiryaki Blog. Retrieved from https://aydintiryaki.org/2026/02/09/artificial-intelligence-and-the-matchstick-puzzle-a-cognitive-process-analysis-claude/

This comparative analysis is based entirely on primary source material from Aydın Tiryaki’s documented experiments with three AI systems: Gemini (Google AI), ChatGPT (OpenAI), and Claude (Anthropic). The analysis examines their respective reasoning processes, error patterns, and adaptive behaviors when confronted with the same visual logic puzzle.

P	S	Ç	P	C	C	P
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28