Finisky Garden

NLP, 软件工程, 产品设计

In late November 2025, an open-source project called OpenClaw went live on GitHub. Four and a half months later, it had 350K stars, 70K forks, 81 releases, and sponsorships from OpenAI, NVIDIA, and Vercel. For comparison: Open WebUI took two and a half years to reach 130K stars; NextChat took three years to hit 88K. Growth like OpenClaw's is rare in GitHub's history.

It isn't a new model, a training framework, or even a "technical breakthrough" in the traditional sense. It's a personal AI assistant that runs on your own machine and talks to you through the chat apps you already use — WhatsApp, Telegram, Slack, Discord, WeChat, Feishu, iMessage, Matrix, and over 25 platforms in total, all connected to a single backend.

This post explores why it broke out of the developer bubble.

阅读全文 »

2025 年 11 月底,一个叫 OpenClaw 的开源项目在 GitHub 上线。4 个半月后,它有 35 万 Star,7 万 Fork,81 个发布版本,OpenAI、NVIDIA、Vercel 做它的赞助商。同期对比:Open WebUI 用了两年半才到 13 万,NextChat 三年到 8.8 万。OpenClaw 的增速在 GitHub 历史上都不多见。

它不是一个新模型,不是一个训练框架,甚至不是一个传统意义上的"技术突破"。它是一个个人 AI 助手,跑在你自己的机器上,通过你已经在用的聊天工具跟你对话。WhatsApp、Telegram、Slack、Discord、微信、飞书、iMessage、Matrix,25 个以上的平台,同时接入,一个后台。

这篇聊聊它为什么能出圈。

阅读全文 »

Claude Code registers over 40 tools, yet the average request only uses 3–4 of them. Stuffing every tool's JSON Schema into the system prompt is expensive: each tool's description plus parameter definitions runs about 500 tokens, so 40 tools cost roughly 20K tokens. When all the user wants is to read a file, paying for WebSearch, NotebookEdit, and CronCreate — tools that have nothing to do with the task — is a bad deal.

Claude Code's answer is deferred loading: hide infrequently used tools, expose only their names, and let the model pull in full schemas on demand through a discovery tool. This turns the token cost of tool definitions from a fixed overhead into a pay-as-you-go expense.

阅读全文 »

Claude Code's Edit tool has a deceptively simple interface: give it an old_string, give it a new_string, and it finds the former in a file and replaces it with the latter. Sounds like nothing more than a str.replace(). But in the context of an LLM Agent, this seemingly trivial operation is backed by an entire engineering pipeline spanning string sanitization to concurrency safety. The model stuffs line numbers into its replacement strings. It conjures curly quotes out of thin air. External tools modify the target file while the user is still reviewing the permission dialog. The Edit tool has to stay correct through all of this — far more than find-and-replace can handle.

From observing its behavior, the Edit tool's execution breaks down into three phases: API-layer preprocessing (before the tool even receives input), input validation (before the permission dialog is shown), and the actual write (after the user approves). Each phase handles a distinct class of problems and maintains deliberate sync/async boundaries.

阅读全文 »

Claude Code 的 Edit 工具接口极简: 给一个 old_string, 给一个 new_string, 在文件里找到前者替换成后者。听上去就是一个 str.replace() 的事。但在 LLM Agent 的语境下, 这个看似平凡的操作背后藏着一整套从字符串清洗到并发安全的工程。模型会把行号塞进替换字符串, 会凭空产生弯引号, 会在用户审批的间隙里被外部工具改了目标文件。Edit 工具要在这些情况下保持正确, 比 find-and-replace 复杂得多。

从行为上观察, Edit 工具的执行可以拆成三个阶段: API 层预处理(在工具拿到输入之前), 输入校验(展示权限对话框之前), 和实际写入(用户同意之后)。每个阶段各自处理一类问题, 且刻意保持了特定的同步/异步边界。

阅读全文 »

Claude Code has a mode that appears in no documentation whatsoever. When active, it systematically erases every trace of AI involvement. No Co-Authored-By trailer, no "Generated with Claude Code" footer, and the system prompt itself doesn't even tell the model what it is. This mode is called Undercover Mode. It exists only in Anthropic's internal builds — external users will never see it, because dead code elimination strips the entire feature out during public builds.

The behavioral implications are telling: this mechanism exists because Anthropic employees routinely use Claude Code to commit to public repositories. Without some form of protection, commit messages might contain unreleased model codenames, PR descriptions might expose internal project names, and model identifiers in the system prompt could leak through some vector or another. Undercover Mode is designed to plug all of these holes.

阅读全文 »

Claude Code有一个从未出现在任何文档中的模式,在这个模式下,它会系统性地抹除一切AI参与的痕迹。不写Co-Authored-By,不写Generated with Claude Code的footer,甚至连system prompt里都不告诉模型它自己是什么型号。这个模式叫Undercover Mode,只存在于Anthropic内部构建版本中,外部用户永远看不到它,因为整个功能在公开构建时会被dead code elimination彻底剔除。

从行为推断,这个机制的存在意味着Anthropic员工日常使用Claude Code向公开仓库提交代码。如果没有某种保护措施,commit message里可能出现未发布模型的代号,PR描述里可能暴露内部项目名称,system prompt里的模型标识可能通过某种方式泄露。Undercover Mode就是为了堵住这些口子。

阅读全文 »

When tackling complex tasks, Claude Code spawns multiple sub-agents in parallel, each of which needs the full parent conversation context to do its job effectively. This creates a real cost problem: if the parent conversation has accumulated 100K tokens of context and three sub-agents are spawned simultaneously, a naive implementation would charge 100K tokens of input for each one — 300K total. Anthropic's API offers a Prompt Cache mechanism that gives a 90% discount on the cached prefix portion, but only if the prefix bytes are exactly identical across requests. From observable behavior, Claude Code's forked sub-agents are carefully constructed so that over 99% of the bytes are identical across all parallel forks, compressing the effective input cost of three sub-agents to roughly 120K token-equivalent (100K at full price + 2 × 100K × 10%).

阅读全文 »

Claude Code在执行复杂任务时会并行派生多个子agent, 每个子agent都需要完整的父对话上下文才能有效工作. 这带来一个现实的成本问题: 假设父对话积累了100K token的上下文, 同时派生3个子agent, 朴素实现需要为每个子agent支付100K token的输入费用, 合计300K. Anthropic API的Prompt Cache机制对缓存命中的前缀部分提供90%的价格折扣, 但前提是多个请求之间的前缀字节完全一致. 从行为上观察, Claude Code的fork子agent在构造API请求时做了精心设计, 使得所有并行子agent之间99%以上的字节完全相同, 从而将3个子agent的实际输入成本压缩到约120K token等价价格(100K全价 + 2 * 100K * 10%).

阅读全文 »

A 200K context window sounds generous — until you're in a moderately complex coding session. Read a few dozen files, run several rounds of grep, execute some bash commands, and you've already burned through most of it. Compaction is inevitable, but compaction itself costs money: you need an LLM call to generate a summary, and the input to that call is the very context you're trying to compress. This creates a fascinating engineering trade-off: compact too early and you lose useful information; compact too late and the window overflows; and the cost of compaction itself can't be ignored. Claude Code's answer is a multi-layer cascade: avoid compaction if you can, do it cheaply if you must, and only call the LLM as a last resort.

阅读全文 »
0%