From RAG to Knowledge Compilation

Publish on: 2026-04-16 Classify at: NLP Read:≈ 8min Views: Comments:

RAG re-retrieves, re-assembles, and re-reasons on every query. Ask something that requires synthesizing five documents and the model has to find all five, stitch them together, and derive the answer from scratch. Ask ten times, retrieve ten times. Nothing accumulates.

Karpathy recently posted a gist called LLM Wiki proposing a different approach: instead of retrieving at query time, have the LLM pre-compile knowledge into a structured wiki and query the compiled result.

Link: LLM Wiki (Karpathy)

Where RAG Falls Short

To be clear, RAG’s operating model has a fundamental efficiency problem.

You load 200 papers into a RAG system. Ask “what are the research trends in this field over the past three years?” The model hits the vector database, pulls back the top-10 chunks, stuffs them into the prompt, and generates an answer. Sounds reasonable, but think about it: the information needed spans dozens of papers. Ten chunks won’t cover it. Even at k=50, the relationships between chunks — contradictions, evolution of ideas, converging threads — all have to be figured out in a single inference pass.

The next day you rephrase the question slightly. The model starts from zero again. Yesterday’s synthesis is gone.

NotebookLM, ChatGPT file uploads, most RAG frameworks — they all follow the same pattern: raw documents → retrieval → ad-hoc assembly → generation. Knowledge stays in the raw documents and never gets structurally organized. Interestingly, some products are already sidestepping this path: Claude Code’s retrieval uses no vector database or embedding index, relying on keyword search and file structure reasoning to pinpoint code in million-line codebases. At minimum, this shows vector retrieval isn’t the only game in town.

Add the theoretical ceiling of vector retrieval : single-vector models have a hard combinatorial limit on representable top-k combinations — at 100k documents with k=100, the theoretical lower bound already approaches the maximum dimension of 4096 used by current models. As queries get more complex, retrieval gets less reliable, and it’s not a model quality problem.

Compile, Don’t Interpret

Karpathy’s approach flips the paradigm: instead of interpreting at query time, compile at ingest time.

The programming language analogy works well. RAG is interpreted execution — re-parsing source code on every run. LLM Wiki is compiled execution — source code gets processed once, then you run the compiled artifact.

Three layers. The bottom layer is raw documents: papers, articles, meeting notes, podcast transcripts. Read-only. The middle layer is an LLM-generated wiki: summaries, entity pages, concept pages, comparisons, syntheses. The LLM owns this layer entirely. The top layer is a schema file (like CLAUDE.md) that defines the wiki’s structure, conventions, and workflows.

When you add a new document, the LLM doesn’t just store it for later retrieval. It reads the document, writes a summary page, then updates every relevant entity and concept page across the wiki — noting contradictions with existing claims, adding cross-references, revising summaries. A single document might trigger updates to 10-15 wiki pages.

Knowledge compiled once, continuously updated.

Querying the Compiled Wiki

After compilation, querying changes too. The model no longer digs through raw documents for chunks. It reads the wiki’s index file, finds relevant pages, and synthesizes from already-organized content.

The critical difference: cross-references are already built, contradictions already flagged, synthesis already done. The model doesn’t need to accomplish all of that in a single inference pass.

Karpathy’s own setup: an LLM agent on one side, Obsidian on the other, watching wiki pages update in real time. The LLM writes files, Obsidian renders them, graph view shows which concepts connect and which pages are orphans. His analogy: Obsidian is the IDE, the LLM is the programmer, the wiki is the codebase.

Query results can feed back into the wiki. Ask a comparison question, and the answer itself becomes a new wiki page. Every question you ask makes the knowledge base richer instead of vanishing into chat history.

The Maintenance Problem, Solved

Everyone knows knowledge bases are useful. Few people maintain them. Whether it’s a team wiki or personal notes, the decay follows the same pattern: maintenance cost grows faster than usage value. After 50 pages of notes, every new page means checking for contradictions, updating cross-references, and verifying that old conclusions still hold against new data. Nobody wants to do that work.

LLMs happen to be good at exactly this: they won’t forget to update a cross-reference, won’t mind touching 15 files in one pass, won’t abandon the project because maintenance is boring. Karpathy also suggests periodic linting — having the LLM audit the wiki for contradictions, orphan pages, missing concept entries, and data gaps that could be filled with a web search.

This is the key insight. Maintenance is the bottleneck in knowledge management, and LLMs reduce that cost to near zero.

Where It Fits

Karpathy lists several applications: personal knowledge management (journals, articles, podcast notes), research deep-dives (weeks or months of paper reading), book reading (chapter-by-chapter wiki with characters, themes, and plot threads), team knowledge bases (Slack threads, meeting transcripts, customer calls auto-organized).

The most compelling case is long-term research. Spend three months reading papers in a field, and you’ll forget the details of what you read in week one. RAG can help you locate a specific passage, but it can’t tell you how that passage relates to something from three weeks ago. The wiki can, because those relationships were built at ingest time.

The team use case is interesting too. Every team has the “we discussed this before, the conclusion is somewhere in a Slack channel” problem. An LLM continuously compiling those fragments into a structured wiki would reduce a lot of information loss.

Is RAG Dead Then?

Back to the opening question. Does LLM Wiki make vector retrieval RAG pointless?

No. They solve different problems.

RAG solves “quickly locate relevant fragments in a large document collection.” It works for one-off queries against large corpora that don’t need deep synthesis. You have 100k customer support conversations, a user asks about a specific product issue, RAG finds the relevant ones in milliseconds. You don’t need and can’t afford to pre-compile a wiki for that.

LLM Wiki solves “continuously accumulate and synthesize knowledge from a manageable document collection.” Document count is moderate (tens to hundreds), but inter-document relationships are complex and need long-term maintenance.

Put differently: RAG is a search engine, LLM Wiki is an encyclopedia. You wouldn’t organize 100k support tickets like an encyclopedia, and you wouldn’t do a three-month literature review with a search engine.

The RAG community is already moving in this direction. Microsoft’s GraphRAG builds a knowledge graph before retrieval — essentially a form of knowledge compilation. LLM Wiki goes further: the compiled artifact isn’t a graph but human-readable documents. Both share the same judgment: query-time retrieval alone isn’t enough; you need structural processing at ingest time.

The real takeaway may be that RAG shouldn’t be the end of the knowledge management pipeline. Many people dump documents into a vector database and call it done, but retrieval is just step one. For scenarios requiring deep understanding, knowledge compilation after retrieval is what matters.

The Rough Edges

Karpathy’s gist is honest about being an “idea file” — it describes the pattern, not an implementation. Running this in practice has some obvious rough edges.

Hallucination risk gets amplified in a wiki context. In RAG, a hallucination affects one answer; the next query re-retrieves and has a chance to self-correct. In a wiki, if an entity page contains an incorrect fact, every subsequent analysis referencing that page builds on the error. Mistakes compile into the knowledge base and compound over time. This is probably the strongest argument for the lint mechanism: not just finding orphan pages, but catching errors that have been baked in.

Scale is a concern. Karpathy says the index file works at moderate scale (~100 sources, hundreds of pages), but beyond that you need a search engine. Searching structured wiki pages is qualitatively different from searching raw document chunks though — wiki pages have titles, categories, and cross-references, so search precision and recall will be much better. The real risk is elsewhere: as wiki pages multiply, each LLM update needs to check more related pages, so per-ingest maintenance cost gradually climbs.

Cost isn’t trivial either. Each document ingestion might trigger updates across a dozen pages, each update an LLM call. The compilation cost for 100 documents is far more than vectorizing them for a RAG pipeline. But flip the perspective: that cost is paid once at ingest, saving repeated reasoning on every query. If your query frequency is much higher than your ingest frequency, compilation pays for itself.

An Old Idea, Newly Feasible

Karpathy closes with a reference to Vannevar Bush’s 1945 Memex: a private, curated knowledge store where links between documents are as valuable as the documents themselves. The problem Bush couldn’t solve was who does the maintenance. Eighty years later, LLMs fill that gap.

From Memex to wikis to Notion to RAG to LLM Wiki, the history of knowledge management is a repeated struggle with the same tension: storing is easy, organizing is hard. LLM Wiki isn’t the final answer, but it’s the first time “automatic organization” has been genuinely feasible. Whether the organization quality is good enough — that probably takes three months of use to find out.