Case Study: How Cursor Solves the Context Window Problem
Cursor is one of the most architecturally interesting LLM applications in production today. On the surface, it looks like a simple AI-powered IDE. Under the hood, it’s solving one of the hardest problems in LLM system design: how do you give a model meaningful understanding of a codebase that’s orders of magnitude larger than its context window?
This article is a reconstruction of Cursor’s architecture based on public information - blog posts, engineering talks, and the product’s observable behaviour. I find it one of the clearest examples of what thoughtful LLM system design looks like at the production level.
The Context Window Problem
A context window is the amount of text a model can see at once when generating a response. Even the largest current context windows, 200K tokens for Claude 3.5, 1M for Gemini 1.5 Pro, are dwarfed by real codebases.
A mid-sized production codebase with 500 files, 100K lines of code, contains roughly 3–5 million tokens. Even the largest context windows hold only 2–5% of that.
When a developer asks, ‘Why is the checkout flow failing when a guest user applies a coupon?’, the answer likely requires understanding the checkout controller, the coupon validation service, the guest session middleware, and possibly a shared utility used by all three. These files are spread across the codebase. The model can’t see all of them at once. The challenge is figuring out which files are relevant accurately and quickly, without asking the developer to specify.
The Indexing Architecture
Cursor’s core insight is that code has a structure that natural language doesn’t. You can parse it into an Abstract Syntax Tree (AST) and extract semantic units, ie. functions, classes, methods, imports, etc., rather than chunking it arbitrarily by token count.
The indexing pipeline works like this:
File Watching: Cursor monitors the file system for changes using OS-level events. When a file changes, it’s queued for re-indexing. This keeps the index fresh as the developer codes, without requiring full re-indexing on every change.
AST Parsing: Changed files are parsed into ASTs using tree-sitter, a fast incremental parser that supports 40+ languages. The AST is traversed to extract semantic units, not 256-token chunks, but function definitions, class bodies, and module-level declarations. These are natural semantic boundaries.
Embedding and Storage: Each semantic unit is embedded and stored in a local vector index. The metadata alongside each embedding includes file path, line range, symbol name, and symbol type. This is what allows precise, symbol-level attribution rather than ‘here’s a 300-token chunk’.
Dependency Graph: Alongside the vector index, Cursor maintains a dependency graph - which files import which other files, which functions call which other functions. This is the knowledge graph layer, enabling the retrieval of related code that isn’t semantically similar to the query.
Retrieval at Query Time
When a developer asks a question, Cursor runs a multi-pass retrieval:
Pass 1: Embed the query, run ANN search against the code index, find the k most semantically similar semantic units.
Pass 2: For each retrieved unit, traverse the dependency graph to find callers, callees, and imports.
Pass 3: Apply re-ranking based on recency (recently edited files are more likely relevant) and call depth (closer dependencies score higher).
The result is a ranked list of semantic units packed into the context window. Because the units are function-level rather than chunk-level, the model receives complete, syntactically valid code, not arbitrary text fragments cut off mid-function.
The Incremental Update Problem
The hardest part of this architecture isn’t the initial index build - it’s keeping the index consistent as the developer edits in real time.
Cursor addresses this with a two-tier index: a persisted index updated on file save, and an in-memory delta updated for the currently-open file. Retrieval queries both tiers and merges results, giving real-time index freshness effectively without the overhead of re-embedding on every keystroke.
What This Architecture Teaches Us
The Cursor design generalises to any LLM application where the knowledge base has a structure:
Parse to semantic units, not arbitrary chunks. In any domain, legal documents, medical records, and product catalogs, there are natural semantic boundaries. Finding and indexing at those boundaries produces better retrieval than token-count chunking.
Build a relationship graph alongside the vector index. The vector index finds similar things. The graph finds related things. Complex queries need both.
Design for incremental updates from day one. Full re-indexing on every change doesn’t scale. A two-tier or event-driven update architecture is worth the upfront investment.
The context window isn’t a limitation to work around; it’s a constraint to design for. The best LLM applications treat retrieval as a first-class engineering problem.
This case study is from Chapter 3 of System Design for the LLM Era, which includes a full architectural deep dive. Available on Leanpub, Gumroad and Topmate.

