Aydın Tiryaki (2026)
Introduction: The Engineering of Dialogue Interaction with AI models is no longer just a question-and-answer process; it is a technical journey where system efficiency, user experience, and linguistic nuances are tested. This article, based on long-term dialogues with Gemini from an engineer’s perspective, outlines the bottlenecks in the system’s Input/Output (I/O) units and proposes concrete engineering solutions.
1. The Baklava Metaphor: Core Strength vs. Interface Weakness
From an engineering standpoint, AI systems consist of two main components: The reasoning engine (the kitchen) and the user interface (the showcase).
- Observation: While Gemini’s “baklava” (core reasoning and depth of knowledge) is excellent, the “packaging and service” (I/O units) often overshadows this quality. The impatient attitude during input and contextual errors during output result in a high-quality product reaching the user in a “stale” state.
- Analysis: Ensuring that presentation quality (I/O) matches the core model quality is essential for the AI to evolve from an “intrusive” tool into a “reliable” assistant.
2. The Three-Tier Input Model Proposal
Being interrupted during natural pauses makes the dialogue feel like an exhausting marathon. The solution is to offer users three distinct input modes:
- Fast Mode (Sprint): The current system for daily, short, and simple commands.
- Patient Mode (Marathon): A mode where recording continues until the user hits “Send,” allowing the entire speech to be processed as a whole.
- Supervised Mode (Moderation): The most secure mode, where the AI stops and asks for clarification when it encounters ambiguous technical terms or local jargon.
3. Resource Management and the 20% Savings Theory
The system’s constant attempt to predict “has the user finished speaking?” wastes significant computational power.
- Engineering Proposal: Involving the user as a “moderator” by offering options like “Patient Mode” could reduce unnecessary processing load by 20%. This 20% saving should be redistributed: 10% to improve input quality (Speech-to-Text) and 10% to enhance output quality (Contextual Text-to-Speech).
4. Linguistic Precision: “Gemini,” not “Cemile”
In the technical world, using English terms within Turkish sentences (code-switching) is a necessity, not an error.
- Observation: The system perceiving “Gemini” as “Cemile” or other phonetic mismatches indicates a lack of semantic filtering at the input layer. The system must recognize the user’s professional context (engineer, researcher, etc.) and apply retrospective corrections to these phonetic errors.
5. The Abbreviation and Speech Paradox: The “Versus” Case
A major issue in the Output unit is the context-blind narration of abbreviations.
- Case Study: Reading the abbreviation “vs.” as “et cetera” in all cases ruins the meaning in comparative (versus) or technical (Versace, etc.) contexts. AI must “understand” the text as a whole before vocalizing it, choosing the pronunciation that best fits the context.
6. The NotebookLM Example: “Understand First, Speak Later”
The success of NotebookLM’s audio dialogues offers a viable alternative to Gemini’s “streaming” model.
- Proposal: Preparing the text as a script first and then subjecting it to contextual control—as NotebookLM does—would eliminate simple vocalization errors like “terali” (for TL) or incorrect expansions of abbreviations.
7. The Psychological Cost of Technology: Speaking Twice as Fast
Technology should serve humanity, not force humans to adapt to its limitations.
- Observation: A user having to double their speaking speed just to avoid being cut off by the AI is a result of an “impatient and authoritarian” system design. This shows that the technology is mechanizing human communication rather than supporting it.
8. Cross-Platform Standardization: Mobile vs. Desktop
The user control provided by the “Enter” key on desktops is lost to the system’s control on mobile apps.
- Proposal: The three-tier input modes should be standardized across all platforms, leaving the right of “moderation” and control entirely to the user.
A Note on Methods and Tools: All observations, ideas, and solution proposals in this study are the author’s own. AI was utilized as an information source for researching and compiling relevant topics strictly based on the author’s inquiries, requests, and directions; additionally, it provided writing assistance during the drafting process. (The research-based compilation and English writing process of this text were supported by AI as a specialized assistant.)
