- By Prateek Levi
- Thu, 23 Oct 2025 04:34 PM (IST)
- Source:JND
Popular AI chatbots like ChatGPT, Gemini, and Claude have transformed the way people interact with technology, but users often notice odd behavior during long conversations. Responses can become repetitive, or earlier messages may be “forgotten.” Experts say the culprit is a technical limitation known as the context window.
Author and YouTuber Matt Pocock recently discussed this phenomenon in a video, explaining that context windows are among the most critical—and frequently misunderstood—constraints of large language models (LLMs).
ALSO READ: Meta Joins Big Tech Layoff Wave, Cuts 600 Jobs In AI Superintelligence Division
Context Window, What Is It And Why It Matter
The context window acts like an AI model’s short-term memory. Every input and output is broken into tokens, which are units representing characters or parts of words. For instance, the word “Anonymous” could be split into “anony” and “mous.” The total number of tokens a model can process at one time defines its context window. Once the limit is reached, older data is truncated, causing the AI to “forget” previous conversation points.
Why No Infinite Memory?
While infinite memory may seem ideal, it is not practical. More tokens require more computing power and memory, which increases cost and complexity. Pocock notes that “finding the right detail inside a huge document or a long chat would be like finding a needle in a haystack.” Each AI model has a fixed architectural limit, which explains why Claude 4.5 has a 200K-token limit, Gemini 2.5 Pro supports up to two million, and smaller models like LLaMA and Mistral handle only a few thousand.
One major limitation is the so-called “lost in the middle” problem. The AI focuses on the beginning of a conversation (instructions) and the most recent exchanges, while the middle often receives less attention. Pocock explains that this behavior stems from the model’s “attention” mechanism, similar to human primacy and recency bias.
For developers using AI coding assistants like Claude Code or GitHub Copilot, this limitation can be frustrating. Tasks or bugs introduced early in a session may be difficult to recall if they fall in the middle of the context window. Pocock suggests clearing or summarizing sessions to refresh memory. “Clearing gives users a completely blank slate,” he says, noting that compacting context helps retain critical information while freeing up tokens.
ALSO READ: Sundar Pichai: Willow Chip Marks 'Significant Step Toward Real-World Quantum Computing'
Experts emphasize that bigger context windows do not automatically improve AI performance. Too much data can confuse the model, leading to misprioritized instructions or vague answers. Treating the context window as a workspace and managing it carefully ensures that AI remains fast, reliable, and accurate, regardless of which platform is used.