Finisky Garden

Thoughts on Training Data Depletion and Quality Degradation

Publish on: 2026-04-22 Classify at: NLP Read:≈ 3min Views: Comments:

There used to be a popular claim: human-generated data on the internet would be exhausted within a few years, and LLM training data was running out. Epoch AI predicted in 2022 that high-quality text data might be depleted around 2026. The discussion was intense at the time, but it seems to come up less often now.

The internet is indeed flooded with AI-generated content. Search for almost anything and the first few results likely include AI-written text. According to earlier fears, this AI-generated text would be crawled back as training data, models eating their own output, getting worse with each iteration — the so-called model collapse. A Nature paper rigorously demonstrated this: recursively training on a model’s own output causes tail distributions to gradually vanish, making outputs increasingly homogeneous.

Claude Code: 1.6% AI, 98.4% Scaffolding

Publish on: 2026-04-22 Classify at: NLP Read:≈ 6min Views: Comments:

I’ve written about Claude Code’s memory management, context compaction, RAG, security classifier, Edit tool, and sub-agent cache sharing — each time reading the source code line by line. But after reading this paper, I realized I’d been looking at individual parts without seeing the whole machine.

The paper is Dive into Claude Code: The Design Space of Today’s and Future AI Agent Systems , 46 pages, a complete architectural dissection of Claude Code from source code. Not a usage tutorial, not a benchmark evaluation, but an answer to an engineering question: what is a production AI agent system’s code actually doing?

The Hivemind of Language Models

Publish on: 2026-04-17 Classify at: NLP Read:≈ 6min Views: Comments:

Ask GPT-4 to recommend an underrated sci-fi film. It says Moon. Ask Claude the same question — also Moon. Try Gemini — Moon again. A NeurIPS 2025 Best Paper, Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) , quantifies this phenomenon at scale: different language models give strikingly similar answers to open-ended questions.

From RAG to Knowledge Compilation

Publish on: 2026-04-16 Classify at: NLP Read:≈ 8min Views: Comments:

RAG re-retrieves, re-assembles, and re-reasons on every query. Ask something that requires synthesizing five documents and the model has to find all five, stitch them together, and derive the answer from scratch. Ask ten times, retrieve ten times. Nothing accumulates.

Karpathy recently posted a gist called LLM Wiki proposing a different approach: instead of retrieving at query time, have the LLM pre-compile knowledge into a structured wiki and query the compiled result.

Theoretical Ceiling of Vector Retrieval

Publish on: 2026-04-15 Classify at: NLP Read:≈ 6min Views: Comments:

Dense retrieval has become the default first stage in RAG pipelines. Encode documents into vectors, encode queries into vectors, compute cosine similarity, done. But a basic question rarely gets asked: for a d-dimensional embedding, how many distinct top-k retrieval results can it actually represent?

An ICLR 2026 paper from Google DeepMind and JHU, “On the Theoretical Limitations of Embedding-Based Retrieval”, gives a mathematical answer: not enough. Not even close.

Unexpected Perks of Talking to AI

Publish on: 2026-04-14 Classify at: Thoughts Read:≈ 7min Views: Comments:

Justin Sun (the crypto guy) recently dropped a hot take: “It’s 2026 already — if you can talk to AI, stop talking to humans.” He also said something about deleting contacts born before 1990 and WeChat being for old people. Classic Justin Sun. Take it with a grain of salt.

But strip away the absurd parts, and “talk to AI more” is something I actually agree with as a heavy user. Here’s why, and where it falls apart.

How Claude Dreams: Background Memory Defragmentation

Publish on: 2026-04-13 Classify at: NLP Read:≈ 7min Views: Comments:

There’s a module inside Claude Code called autoDream. Its prompt title reads “Dream: Memory Consolidation.”

This isn’t a metaphor. Claude Code actually spins up a background sub-agent that reviews transcripts from past sessions, consolidates scattered memories — merging, deduplicating, correcting — and writes them back to disk. The whole thing is invisible unless you dig into the background task list.

AI and Employment: A 200-Year-Old Debate

Publish on: 2026-04-11 Classify at: Thoughts Read:≈ 7min Views: Comments:

Tech job boards in 2025 are schizophrenic. Traditional software engineering roles are shrinking. “AI”-prefixed positions are expanding. Same company, same quarter — cutting junior devs and project managers on one side, opening Agent orchestration engineers and AI application architects on the other.

Three Evolutions of Agent Engineering

Publish on: 2026-04-09 Classify at: NLP Read:≈ 8min Views: Comments:

Last June, Shopify CEO Tobi Lutke posted that he preferred “context engineering” over “prompt engineering.” Karpathy retweeted with a +1. Simon Willison wrote a blog post saying the term might actually stick. Phil Schmid published a full definition. Half the AI community switched terminology within a week.

By early 2026, Phil Schmid introduced another term: Agent Harness. It didn’t generate the same buzz, but anyone building Coding Agents quietly nodded along.

Context Management in Claude Code vs OpenClaw

Publish on: 2026-04-07 Classify at: NLP Read:≈ 9min Views: Comments:

After OpenClaw crossed 350K stars, a narrative started forming in the community: since both run on Opus 4.6 under the hood, the open-source option should be on par with Claude Code. Anyone who has actually used both probably shares the same observation — in long sessions, OpenClaw starts losing context, forgetting files it already read, redoing work it already did. Claude Code does too, but noticeably later, and it recovers much better.

Same model, different experience. Why?

Foundation Models Plateau, Applications Take Off

Publish on: 2026-04-07 Classify at: NLP Read:≈ 8min Views: Comments:

Cursor’s parent company Anysphere has about 150 employees. In November 2025, its ARR crossed $1 billion . OpenAI, as of early 2026, has 4,500 employees . Its 2025 revenue was $13.1 billion, but according to Fortune, it lost roughly $9 billion and doesn’t expect to turn profitable until 2028.

An application company that trains zero models is outproducing, per capita, the company that trains them. This is the most telling signal in AI for 2025.

How OpenClaw Hit 350K Stars in 4 Months

Publish on: 2026-04-06 Classify at: Product Read:≈ 8min Views: Comments:

In late November 2025, an open-source project called OpenClaw went live on GitHub. Four and a half months later, it had 350K stars, 70K forks, 81 releases, and sponsorships from OpenAI, NVIDIA, and Vercel. For comparison: Open WebUI took two and a half years to reach 130K stars; NextChat took three years to hit 88K. Growth like OpenClaw’s is rare in GitHub’s history.

It isn’t a new model, a training framework, or even a “technical breakthrough” in the traditional sense. It’s a personal AI assistant that runs on your own machine and talks to you through the chat apps you already use: WhatsApp, Telegram, Slack, Discord, WeChat, Feishu, iMessage, Matrix, and over 25 platforms in total, all connected to a single backend.

Why did it break out of the developer bubble?

Deferred Tool Loading in Claude Code

Publish on: 2026-04-05 Classify at: NLP Read:≈ 6min Views: Comments:

When you use Claude Code, there’s something you probably never notice: it has over 40 registered tools, but when you ask it to read a file or edit a few lines of code, it only uses three or four. The definitions for the remaining 30-plus tools, each around 500 tokens, add up to over 10,000 tokens of fixed overhead per request. You just want to change one line of CSS, but you’re paying for WebSearch, NotebookEdit, CronCreate, and a bunch of tools you’ll never touch.

Why Claude Code's Edit Tool Doesn't Mangle Your Files

Publish on: 2026-04-05 Classify at: NLP Read:≈ 8min Views: Comments:

Claude Code’s Edit tool has a deceptively simple interface: give it an old_string, give it a new_string, and it finds the former in a file and replaces it with the latter. Sounds like nothing more than a str.replace(). But in the context of an LLM Agent, this seemingly trivial operation is backed by an entire engineering pipeline spanning string sanitization to concurrency safety. The model stuffs line numbers into its replacement strings. It conjures curly quotes out of thin air. External tools modify the target file while the user is still reviewing the permission dialog. The Edit tool has to stay correct through all of this — far more than find-and-replace can handle.

From observing its behavior, the Edit tool’s execution breaks down into three phases: API-layer preprocessing (before the tool even receives input), input validation (before the permission dialog is shown), and the actual write (after the user approves). Each phase handles a distinct class of problems and maintains deliberate sync/async boundaries.

Claude Code's Undercover Mode: When AI Learns to Hide Itself

Publish on: 2026-04-05 Classify at: NLP Read:≈ 10min Views: Comments:

Claude Code has a mode that appears in no documentation whatsoever. When active, it systematically erases every trace of AI involvement. No Co-Authored-By trailer, no “Generated with Claude Code” footer, and the system prompt itself doesn’t even tell the model what it is. This mode is called Undercover Mode. It exists only in Anthropic’s internal builds; external users will never see it, because dead code elimination strips the entire feature out during public builds.

The behavioral implications are telling: this mechanism exists because Anthropic employees routinely use Claude Code to commit to public repositories. Without some form of protection, commit messages might contain unreleased model codenames, PR descriptions might expose internal project names, and model identifiers in the system prompt could leak through some vector or another. Undercover Mode is designed to plug all of these holes.

How Forked Sub-Agents Share Prompt Cache for 90% Savings

Publish on: 2026-04-05 Classify at: NLP Read:≈ 6min Views: Comments:

When tackling complex tasks, Claude Code spawns multiple sub-agents in parallel, each needing the full parent conversation context. If the parent has accumulated 100K tokens and three sub-agents are spawned, a naive implementation charges 300K tokens of input.

Anyone familiar with LLM inference optimization will recognize this immediately: it’s a KV Cache sharing problem. When multiple requests share the same prefix, the Attention layer’s Key/Value tensors can be reused, skipping redundant computation. Anthropic exposes this capability to API users as Prompt Cache, offering a 90% discount on cached prefix portions — but only if the prefix bytes are exactly identical across requests. Claude Code’s fork sub-agents are deliberately constructed so that over 99% of the bytes are identical, compressing the effective input cost of three sub-agents to roughly 120K token-equivalent (100K full price + 2 × 100K × 10%).

Context Compaction in Claude Code: A Five-Layer Cascade and the Art of Free Summaries

Publish on: 2026-04-05 Classify at: NLP Read:≈ 17min Views: Comments:

A 200K context window sounds generous — until you’re in a moderately complex coding session. Read a few dozen files, run several rounds of grep, execute some bash commands, and you’ve already burned through most of it. Compaction is inevitable, but compaction itself costs money: you need an LLM call to generate a summary, and the input to that call is the very context you’re trying to compress. This creates a fascinating engineering trade-off: compact too early and you lose useful information; compact too late and the window overflows; and the cost of compaction itself can’t be ignored. Claude Code’s answer is a multi-layer cascade: avoid compaction if you can, do it cheaply if you must, and only call the LLM as a last resort.

How Claude Code Defends Against Bash Injection

Publish on: 2026-04-05 Classify at: NLP Read:≈ 13min Views: Comments:

The security challenge of AI-executed Bash commands isn’t “should we trust the model” — it’s “how do we make sure a command actually means what it looks like.”

Claude Code lets AI execute Bash commands directly. Not through a structured interface like MCP; it literally gives the model a shell. MCP’s approach wraps tools into JSON schemas, which is safe enough, but you can’t realistically write adapters for thousands of CLI tools. The capability ceiling is obvious. A raw shell can do anything; the tradeoff is that the security problem shifts from “controlling interface permissions” to “figuring out what a command actually does.” In the previous post, I covered how YOLO Classifier uses AI to review AI, but the Classifier works with the full command string and makes a semantic judgment: is this operation dangerous? Before that judgment even happens, there’s a deeper question that needs answering: does this command actually mean what it appears to mean?

Inside Claude Code's YOLO Classifier

Publish on: 2026-04-05 Classify at: NLP Read:≈ 8min Views: Comments:

Claude Code has an auto mode that executes operations without confirmation. But “auto” doesn’t mean “unreviewed” — there’s a classifier watching every action.

The Auto Mode Paradox

One of the most annoying things about using Claude Code is the permission popups. Every Bash command, every file write requires a confirmation click. Power users turn on auto mode, letting Claude execute everything autonomously without asking.

This creates an obvious problem: what if the model decides to rm -rf /, push to the production branch, or write a backdoor into .bashrc?

Dissecting Claude Code's RAG Mechanism

Publish on: 2026-04-02 Classify at: NLP Read:≈ 10min Views: Comments:

Claude Code has no vector database and no embedding index, yet it can pinpoint the exact file you need in a million-line codebase. Behind this is a retrieval architecture completely different from traditional RAG.

This Isn’t the RAG You Know

If you’ve used RAG before, the pipeline should be familiar: build an offline index, user asks a question, vector-search for Top-K chunks, inject into prompt, generate an answer. A straight line, one pass, done.

Claude Code doesn’t work like that at all. It has no offline index. The model itself drives the retrieval process.