Finisky Garden

NLP, 软件工程, 产品设计

Claude Code has a mode that appears in no documentation whatsoever. When active, it systematically erases every trace of AI involvement. No Co-Authored-By trailer, no "Generated with Claude Code" footer, and the system prompt itself doesn't even tell the model what it is. This mode is called Undercover Mode. It exists only in Anthropic's internal builds; external users will never see it, because dead code elimination strips the entire feature out during public builds.

The behavioral implications are telling: this mechanism exists because Anthropic employees routinely use Claude Code to commit to public repositories. Without some form of protection, commit messages might contain unreleased model codenames, PR descriptions might expose internal project names, and model identifiers in the system prompt could leak through some vector or another. Undercover Mode is designed to plug all of these holes.

阅读全文 »

Claude Code有一个从未出现在任何文档中的模式,在这个模式下,它会系统性地抹除一切AI参与的痕迹。不写Co-Authored-By,不写Generated with Claude Code的footer,甚至连system prompt里都不告诉模型它自己是什么型号。这个模式叫Undercover Mode,只存在于Anthropic内部构建版本中,外部用户永远看不到它,因为整个功能在公开构建时会被dead code elimination彻底剔除。

从行为推断,这个机制的存在意味着Anthropic员工日常使用Claude Code向公开仓库提交代码。如果没有某种保护措施,commit message里可能出现未发布模型的代号,PR描述里可能暴露内部项目名称,system prompt里的模型标识可能通过某种方式泄露。Undercover Mode就是为了堵住这些口子。

阅读全文 »

When tackling complex tasks, Claude Code spawns multiple sub-agents in parallel, each needing the full parent conversation context. If the parent has accumulated 100K tokens and three sub-agents are spawned, a naive implementation charges 300K tokens of input.

Anyone familiar with LLM inference optimization will recognize this immediately: it's a KV Cache sharing problem. When multiple requests share the same prefix, the Attention layer's Key/Value tensors can be reused, skipping redundant computation. Anthropic exposes this capability to API users as Prompt Cache, offering a 90% discount on cached prefix portions — but only if the prefix bytes are exactly identical across requests. Claude Code's fork sub-agents are deliberately constructed so that over 99% of the bytes are identical, compressing the effective input cost of three sub-agents to roughly 120K token-equivalent (100K full price + 2 × 100K × 10%).

阅读全文 »

Claude Code 执行复杂任务时会并行派生多个子 agent,每个都需要父对话的完整上下文。假设父对话积累了 100K token,并行 3 个子 agent,朴素实现要付 300K 的输入费用。

熟悉 LLM 推理优化的人会立刻想到:这不就是 KV Cache 共享的问题吗?多个请求如果前缀相同,Attention 层的 Key/Value 可以复用,省掉重复计算。Anthropic 把这个能力以 Prompt Cache 的形式暴露给了 API 用户,对缓存命中的前缀部分打一折。但前提是多个请求的前缀字节完全一致。Claude Code 的 fork 子 agent 被刻意构造成彼此之间 99% 以上的字节相同,3 个子 agent 的实际输入成本大约只有 120K 等价(100K 全价 + 2 × 100K × 10%)。

阅读全文 »

A 200K context window sounds generous — until you're in a moderately complex coding session. Read a few dozen files, run several rounds of grep, execute some bash commands, and you've already burned through most of it. Compaction is inevitable, but compaction itself costs money: you need an LLM call to generate a summary, and the input to that call is the very context you're trying to compress. This creates a fascinating engineering trade-off: compact too early and you lose useful information; compact too late and the window overflows; and the cost of compaction itself can't be ignored. Claude Code's answer is a multi-layer cascade: avoid compaction if you can, do it cheaply if you must, and only call the LLM as a last resort.

阅读全文 »

200K context window 听起来很大, 但一个中等复杂度的编程 session, 读几十个文件, 跑几轮 grep, 执行一些 bash 命令, 就能轻松吃掉大半个窗口. 压缩是必须的, 但压缩本身又要花钱: 你需要一次 LLM 调用来生成摘要, 而这次调用的输入就是你要压缩的那整段上下文. 这形成了一个有趣的工程权衡: 压得太早浪费信息, 压得太晚窗口爆了, 压缩本身的成本也不能忽略. Claude Code 对此给出的答案是一套多层级联系统: 能不压就不压, 能便宜压就便宜压, 实在不行才动用 LLM.

阅读全文 »

The security challenge of AI-executed Bash commands isn't "should we trust the model" — it's "how do we make sure a command actually means what it looks like."

Claude Code lets AI execute Bash commands directly. Not through a structured interface like MCP; it literally gives the model a shell. MCP's approach wraps tools into JSON schemas, which is safe enough, but you can't realistically write adapters for thousands of CLI tools. The capability ceiling is obvious. A raw shell can do anything; the tradeoff is that the security problem shifts from "controlling interface permissions" to "figuring out what a command actually does." In the previous post, I covered how YOLO Classifier uses AI to review AI, but the Classifier works with the full command string and makes a semantic judgment: is this operation dangerous? Before that judgment even happens, there's a deeper question that needs answering: does this command actually mean what it appears to mean?

阅读全文 »

AI 执行 Bash 命令的安全问题,不是"该不该信任模型",而是"怎么确认命令的含义和它看起来的一样"。

Claude Code 允许 AI 直接执行 Bash 命令。不是通过 MCP 那样的结构化接口间接调用,是真给模型开了一个 shell。MCP 的思路是把工具封装成 JSON schema,安全是安全了,但你不可能为几千个 CLI 工具逐个写 adapter,能力天花板肉眼可见。直接给 Bash,什么都能干,代价是安全问题从"接口权限控制"变成了"搞懂这条命令到底在干嘛"。上一篇聊了 YOLO Classifier 怎么用 AI 审查 AI,但 YOLO Classifier 看到的是完整的命令字符串,它做的是语义判断:这个操作危不危险。在那之前,还有一层更底层的问题需要回答:这个命令真的是它看起来的那个意思吗?

阅读全文 »

Claude Code has an auto mode that executes operations without confirmation. But "auto" doesn't mean "unreviewed" — there's a classifier watching every action.

The Auto Mode Paradox

One of the most annoying things about using Claude Code is the permission popups. Every Bash command, every file write requires a confirmation click. Power users turn on auto mode, letting Claude execute everything autonomously without asking.

This creates an obvious problem: what if the model decides to rm -rf /, push to the production branch, or write a backdoor into .bashrc?

阅读全文 »

Claude Code 有个自动模式,可以不经确认直接执行操作。但"自动"不等于"不审查",每个动作背后都有一个分类器在盯着。

自动模式的矛盾

用 Claude Code 写代码,最烦的事之一是权限弹窗。每次执行 Bash 命令、写文件,都要点一下确认。效率高的用户会开启自动模式(auto mode),让 Claude 自主执行所有操作,不再逐个询问。

这带来一个显而易见的问题:如果模型决定 rm -rf /,或者把代码推到 production 分支,或者往 .bashrc 里写个后门呢?

阅读全文 »
0%