Claude Code's Undercover Mode: When AI Learns to Hide Itself

Posted on 2026-04-05 Edited on 2026-04-06 In NLP 评论: Views:

Claude Code has a mode that appears in no documentation whatsoever. When active, it systematically erases every trace of AI involvement. No Co-Authored-By trailer, no "Generated with Claude Code" footer, and the system prompt itself doesn't even tell the model what it is. This mode is called Undercover Mode. It exists only in Anthropic's internal builds — external users will never see it, because dead code elimination strips the entire feature out during public builds.

The behavioral implications are telling: this mechanism exists because Anthropic employees routinely use Claude Code to commit to public repositories. Without some form of protection, commit messages might contain unreleased model codenames, PR descriptions might expose internal project names, and model identifiers in the system prompt could leak through some vector or another. Undercover Mode is designed to plug all of these holes.

Activation Conditions

Undercover Mode activation is not based on organization-level checks. Anthropic's GitHub organization (anthropics) contains both private and public repositories — claude-code itself is public — so a simple organization-name allowlist won't work. From observable behavior, the system maintains a hardcoded list of private repositories, roughly twenty to thirty entries, covering both SSH and HTTPS URL formats. Only when the current working directory's git remote URL matches an entry on this list is the repo classified as internal, and only then is Undercover Mode turned off.

This design choice is itself revealing. A more natural approach would be to query all repositories under the anthropics organization, but that would add public repos to the allowlist, accidentally disabling undercover when working in public repos. So they went with the more conservative per-repo enumeration. Every entry added to that list means someone has explicitly confirmed the repo is private, rather than relying on an organization-level permission setting that could change at any time.

Repository classification has four states:

State       Undercover    Meaning
─────────   ──────────    ──────────────────────────────
'internal'  OFF           Remote matches private allowlist
'external'  ON            Has remote, not on allowlist
'none'      ON            No remote URL / not a git repo
 null        ON            Check hasn't completed yet

Note the last row. The classification check is asynchronous — it fires at startup, but before the result comes back, the default state is ON. This is a safe-by-default design: until you prove you're in a safe environment, the system assumes you're not. If the async check eventually confirms the current repo is internal, the system also clears cached system prompt sections so the next conversation turn picks up the new prompt with Undercover Mode disabled. This cache-clearing step itself reveals something: there's a race condition between system prompt caching and the asynchronous undercover classification, and it has to be handled explicitly.

There's also an edge case worth noting: a user might run Claude Code from a non-git directory — say, a scratch directory under /tmp — and later push to a public remote. With no git remote to match against, the classification result is none, and undercover activates automatically. This isn't a bug; it's a feature.

What It Actually Changes

Once activated, Undercover Mode takes effect across multiple layers simultaneously.

At the git commit level, the Co-Authored-By trailer is removed entirely. The attribution function detects the undercover state and returns an empty string, regardless of user configuration. Normally this function generates a trailer like Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> based on the current model, with fallback logic: for unrecognized models, it falls back to a hardcoded public model name to avoid leaking internal codenames. Under undercover mode, none of that matters — it just returns empty.

At the PR description level, the 🤖 Generated with Claude Code footer is likewise blanked out. This includes the enhanced version with contribution statistics (e.g., 93% 3-shotted by claude-opus-4-6) — that disappears too.

At the system prompt level — and this is the most interesting part — the model's name, ID, and family information are all stripped from the environment description. Normally the system prompt contains something like You are powered by the model named Claude Opus 4.6; under undercover mode this entire section is suppressed. Beyond that, descriptions of the Claude Code product itself and the latest model family listing are also removed.

In pseudocode:

def compute_env_info(model_id):
    env_items = [
        f"Working directory: {cwd}",
        f"Platform: {platform}",
        f"Shell: {shell}",
    ]

    if not (is_ant_build() and is_undercover()):
        marketing_name = get_marketing_name(model_id)
        env_items.append(f"You are powered by {marketing_name}. ID: {model_id}.")
        env_items.append("The most recent Claude model family is Claude 4.5/4.6...")
        env_items.append("Claude Code is available as a CLI...")

    return env_items

On top of that, commit and PR prompts receive an explicit undercover directive listing things that must never appear in a commit message: internal model codenames (animal names like Capybara, Tengu, etc.), unreleased version numbers, internal repo names, Slack channels, short links, and any phrasing that hints at being AI-generated. The directive ends with Do not blow your cover. It even provides positive and negative examples:

GOOD:
  "Fix race condition in file watcher initialization"
  "Add support for custom key bindings"

BAD (never write these):
  "Fix bug found while testing with Claude Capybara"
  "1-shotted by claude-opus-4-6"
  "Co-Authored-By: Claude Opus 4.6 <...>"

Those BAD examples almost certainly come from real incidents — someone left an internal model codename in a public repo commit, and the team was forced to conduct a postmortem and build this entire protection system. Security policies usually emerge from retrospectives, not from preemptive imagination.

There's one more defense-in-depth detail: even if the user has disabled git-related instructions in their settings, the undercover directive is still injected. A code comment explicitly states that the mechanical stripping of attribution and model ID suppression can work automatically, but the explicit "don't blow your cover" directive serves as a last line of defense against the model proactively mentioning internal codenames in commit messages.

No Off Switch

A comment at the top of the code file states plainly: There is NO force-OFF.

Behind this is an asymmetric risk analysis. If undercover is mistakenly activated (false positive), the consequence is some missing attribution — a PR description loses an emoji — a minor inconvenience. If undercover is mistakenly deactivated (false negative), unreleased model codenames, internal project names, and repository paths end up in public git history — information that cannot be retracted once pushed.

Scenario         Risk
───────────────  ─────────────────────────────
False positive   Minor: less attribution
(ON by mistake)  No information leaked

False negative   Severe: internal codenames,
(OFF by mistake) project names in public git
                 history — IRREVERSIBLE

The environment variable CLAUDE_CODE_UNDERCOVER=1 can only force it ON, never off. The sole way to deactivate undercover is for your repo's remote URL to match the hardcoded internal repository list. No flag, no config file, no workaround. This is deliberate. When force-enabled via the environment variable, the system even skips the first-run notification popup — since you manually set the variable, you clearly know what you're doing and don't need to be told.

This design pattern is common in security engineering but rare in product features. Most features offer an off switch because user experience comes first. Here the product team made a clear judgment call: the irreversibility of information leakage outweighs user control.

Identity Suppression

The decision not to tell the model what it is deserves its own discussion. During normal operation, Claude Code's system prompt contains explicit identity information: model name, model ID, model family, and product description. Undercover Mode removes all of it. At runtime, the model cannot learn from its system prompt that it is Claude, nor which version it's running on.

Of course, this isn't true identity erasure. The model's behavioral patterns, language style, and knowledge scope are all baked in — an experienced user can spot it at a glance. But system-prompt-level identity suppression does cut off one concrete information leakage path: the model won't reference its own model name in responses because that information simply isn't present in its current context. More specifically, it won't write 1-shotted by claude-opus-4-6 in a commit message because it doesn't know it is claude-opus-4-6.

This design embeds an interesting assumption: that an LLM's self-awareness derives primarily from explicit declarations in the system prompt, not from emergent knowledge in its parameter weights. If the model's sense of identity were weight-level, stripping the system prompt would have no effect. But in practice, this prompt-level intervention does work — at least in reducing how often the model proactively identifies itself as "I'm Claude."

There's a subtle tension here. The model's training data contains vast amounts of information about Claude, and the RLHF process reinforces specific identity-recognition patterns. Undercover Mode is essentially using instruction-level intervention to suppress weight-level tendencies. It's like hypnotizing someone who remembers their own name and telling them to forget it. The effectiveness depends on the contest between the depth of the hypnosis and the strength of the memory. From a practical standpoint, all that matters is that the model doesn't announce itself in commit messages — it doesn't need to genuinely forget who it is.

Build-Time Elimination

Every code path related to Undercover Mode is gated behind process.env.USER_TYPE === 'ant'. This USER_TYPE is a build-time --define constant, not a runtime environment variable. When building the external version, the bundler constant-folds this check to false, and dead code elimination then removes the entire ant-only branch. A code comment specifically emphasizes that this check must be inlined at every call site — it cannot be extracted into a shared constant, or the bundler won't be able to constant-fold it, and dead code elimination will fail.

# This is what the bundler sees in the external build:
def is_undercover():
    if False:  # USER_TYPE === 'ant' constant-folded
        ...    # entire branch eliminated
    return False

This means the publicly released Claude Code binary physically does not contain any undercover-related code. It's not that the entry point is hidden — the code itself isn't there. You won't find it even by decompiling. This is fundamentally different from features controlled by feature flags — a feature flag merely hides the UI entry point while the code remains; build-time elimination removes it from the artifact entirely.

From an engineering perspective, this is an elegant solution. A single codebase serves both internal and external build targets, differentiated by build-time constants rather than runtime branches, avoiding the burden of maintaining two codebases while ensuring sensitive logic never accidentally appears in public artifacts.

Design Takeaways

The entire Undercover Mode design can be distilled into a handful of principles: safe default (unknown state defaults to ON), asymmetric risk (manual intervention is only allowed toward the safer direction), defense in depth (mechanical stripping + prompt directives + build-time elimination, layered together), and build-time elimination (sensitive features are physically removed from public artifacts).

None of these principles are novel in isolation, but applying them in concert to an AI coding assistant raises some thought-provoking questions. When an AI tool needs to formalize "hiding the fact that it's AI" as a first-class engineering feature — complete with code, tests, and comments explaining the design intent — it signals that AI tooling has entered a new phase. The question is no longer can we use it, but how do we manage the traces after we do.

From another angle, the very existence of Undercover Mode implies a fact: Anthropic uses its own AI tools extensively for day-to-day development, including contributing code to open-source communities. That's not surprising in itself, but the need for a dedicated engineering system to manage attribution and information security around those contributions tells us this is no longer an occasional experiment — it's a systematic workflow. The gap between individual experimentation with AI-assisted coding and organization-wide deployment is bridged precisely by infrastructure like this.

In a sense, Undercover Mode is a marker of AI tooling maturity. Just as the maturity of an intelligence agency isn't measured by how many operatives it can deploy, but by how well-developed its cover story management system is.