Claude Code's Undercover Mode: When AI Learns to Hide Itself
Claude Code has a mode that appears in no documentation whatsoever. When active, it systematically erases every trace of AI involvement. No Co-Authored-By trailer, no "Generated with Claude Code" footer, and the system prompt itself doesn't even tell the model what it is. This mode is called Undercover Mode. It exists only in Anthropic's internal builds — external users will never see it, because dead code elimination strips the entire feature out during public builds.
The behavioral implications are telling: this mechanism exists because Anthropic employees routinely use Claude Code to commit to public repositories. Without some form of protection, commit messages might contain unreleased model codenames, PR descriptions might expose internal project names, and model identifiers in the system prompt could leak through some vector or another. Undercover Mode is designed to plug all of these holes.
Activation Conditions
Undercover Mode activation is not based on organization-level checks. Anthropic's GitHub organization (anthropics) contains both private and public repositories — claude-code itself is public — so a simple organization-name allowlist won't work. From observable behavior, the system maintains a hardcoded list of private repositories, roughly twenty to thirty entries, covering both SSH and HTTPS URL formats. Only when the current working directory's git remote URL matches an entry on this list is the repo classified as internal, and only then is Undercover Mode turned off.
This design choice is itself revealing. A more natural approach would be to query all repositories under the anthropics organization, but that would add public repos to the allowlist, accidentally disabling undercover when working in public repos. So they went with the more conservative per-repo enumeration. Every entry added to that list means someone has explicitly confirmed the repo is private, rather than relying on an organization-level permission setting that could change at any time.
Repository classification has four states:
1 | State Undercover Meaning |
Note the last row. The classification check is asynchronous — it fires at startup, but before the result comes back, the default state is ON. This is a safe-by-default design: until you prove you're in a safe environment, the system assumes you're not. If the async check eventually confirms the current repo is internal, the system also clears cached system prompt sections so the next conversation turn picks up the new prompt with Undercover Mode disabled. This cache-clearing step itself reveals something: there's a race condition between system prompt caching and the asynchronous undercover classification, and it has to be handled explicitly.
There's also an edge case worth noting: a user might run Claude Code
from a non-git directory — say, a scratch directory under
/tmp — and later push to a public remote. With no git
remote to match against, the classification result is none,
and undercover activates automatically. This isn't a bug; it's a
feature.
What It Actually Changes
Once activated, Undercover Mode takes effect across multiple layers simultaneously.
At the git commit level, the Co-Authored-By trailer is removed
entirely. The attribution function detects the undercover state and
returns an empty string, regardless of user configuration. Normally this
function generates a trailer like
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
based on the current model, with fallback logic: for unrecognized
models, it falls back to a hardcoded public model name to avoid leaking
internal codenames. Under undercover mode, none of that matters — it
just returns empty.
At the PR description level, the 🤖 Generated with Claude Code footer
is likewise blanked out. This includes the enhanced version with
contribution statistics (e.g.,
93% 3-shotted by claude-opus-4-6) — that disappears
too.
At the system prompt level — and this is the most interesting part —
the model's name, ID, and family information are all stripped from the
environment description. Normally the system prompt contains something
like You are powered by the model named Claude Opus 4.6;
under undercover mode this entire section is suppressed. Beyond that,
descriptions of the Claude Code product itself and the latest model
family listing are also removed.
In pseudocode:
1 | def compute_env_info(model_id): |
On top of that, commit and PR prompts receive an explicit
undercover directive listing things that must never
appear in a commit message: internal model codenames (animal names like
Capybara, Tengu, etc.), unreleased version numbers, internal repo names,
Slack channels, short links, and any phrasing that hints at being
AI-generated. The directive ends with
Do not blow your cover. It even provides positive and
negative examples:
1 | GOOD: |
Those BAD examples almost certainly come from real incidents — someone left an internal model codename in a public repo commit, and the team was forced to conduct a postmortem and build this entire protection system. Security policies usually emerge from retrospectives, not from preemptive imagination.
There's one more defense-in-depth detail: even if the user has disabled git-related instructions in their settings, the undercover directive is still injected. A code comment explicitly states that the mechanical stripping of attribution and model ID suppression can work automatically, but the explicit "don't blow your cover" directive serves as a last line of defense against the model proactively mentioning internal codenames in commit messages.
No Off Switch
A comment at the top of the code file states plainly:
There is NO force-OFF.
Behind this is an asymmetric risk analysis. If undercover is mistakenly activated (false positive), the consequence is some missing attribution — a PR description loses an emoji — a minor inconvenience. If undercover is mistakenly deactivated (false negative), unreleased model codenames, internal project names, and repository paths end up in public git history — information that cannot be retracted once pushed.
1 | Scenario Risk |
The environment variable CLAUDE_CODE_UNDERCOVER=1 can
only force it ON, never off. The sole way to deactivate undercover is
for your repo's remote URL to match the hardcoded internal repository
list. No flag, no config file, no workaround. This is deliberate. When
force-enabled via the environment variable, the system even skips the
first-run notification popup — since you manually set the variable, you
clearly know what you're doing and don't need to be told.
This design pattern is common in security engineering but rare in product features. Most features offer an off switch because user experience comes first. Here the product team made a clear judgment call: the irreversibility of information leakage outweighs user control.
Identity Suppression
The decision not to tell the model what it is deserves its own discussion. During normal operation, Claude Code's system prompt contains explicit identity information: model name, model ID, model family, and product description. Undercover Mode removes all of it. At runtime, the model cannot learn from its system prompt that it is Claude, nor which version it's running on.
Of course, this isn't true identity erasure. The model's behavioral
patterns, language style, and knowledge scope are all baked in — an
experienced user can spot it at a glance. But system-prompt-level
identity suppression does cut off one concrete information leakage path:
the model won't reference its own model name in responses because that
information simply isn't present in its current context. More
specifically, it won't write 1-shotted by claude-opus-4-6
in a commit message because it doesn't know it is
claude-opus-4-6.
This design embeds an interesting assumption: that an LLM's self-awareness derives primarily from explicit declarations in the system prompt, not from emergent knowledge in its parameter weights. If the model's sense of identity were weight-level, stripping the system prompt would have no effect. But in practice, this prompt-level intervention does work — at least in reducing how often the model proactively identifies itself as "I'm Claude."
There's a subtle tension here. The model's training data contains vast amounts of information about Claude, and the RLHF process reinforces specific identity-recognition patterns. Undercover Mode is essentially using instruction-level intervention to suppress weight-level tendencies. It's like hypnotizing someone who remembers their own name and telling them to forget it. The effectiveness depends on the contest between the depth of the hypnosis and the strength of the memory. From a practical standpoint, all that matters is that the model doesn't announce itself in commit messages — it doesn't need to genuinely forget who it is.
Build-Time Elimination
Every code path related to Undercover Mode is gated behind
process.env.USER_TYPE === 'ant'. This
USER_TYPE is a build-time --define constant,
not a runtime environment variable. When building the external version,
the bundler constant-folds this check to false, and dead
code elimination then removes the entire ant-only branch. A code comment
specifically emphasizes that this check must be inlined at every call
site — it cannot be extracted into a shared constant, or the bundler
won't be able to constant-fold it, and dead code elimination will
fail.
1 | # This is what the bundler sees in the external build: |
This means the publicly released Claude Code binary physically does not contain any undercover-related code. It's not that the entry point is hidden — the code itself isn't there. You won't find it even by decompiling. This is fundamentally different from features controlled by feature flags — a feature flag merely hides the UI entry point while the code remains; build-time elimination removes it from the artifact entirely.
From an engineering perspective, this is an elegant solution. A single codebase serves both internal and external build targets, differentiated by build-time constants rather than runtime branches, avoiding the burden of maintaining two codebases while ensuring sensitive logic never accidentally appears in public artifacts.
Design Takeaways
The entire Undercover Mode design can be distilled into a handful of principles: safe default (unknown state defaults to ON), asymmetric risk (manual intervention is only allowed toward the safer direction), defense in depth (mechanical stripping + prompt directives + build-time elimination, layered together), and build-time elimination (sensitive features are physically removed from public artifacts).
None of these principles are novel in isolation, but applying them in concert to an AI coding assistant raises some thought-provoking questions. When an AI tool needs to formalize "hiding the fact that it's AI" as a first-class engineering feature — complete with code, tests, and comments explaining the design intent — it signals that AI tooling has entered a new phase. The question is no longer can we use it, but how do we manage the traces after we do.
From another angle, the very existence of Undercover Mode implies a fact: Anthropic uses its own AI tools extensively for day-to-day development, including contributing code to open-source communities. That's not surprising in itself, but the need for a dedicated engineering system to manage attribution and information security around those contributions tells us this is no longer an occasional experiment — it's a systematic workflow. The gap between individual experimentation with AI-assisted coding and organization-wide deployment is bridged precisely by infrastructure like this.
In a sense, Undercover Mode is a marker of AI tooling maturity. Just as the maturity of an intelligence agency isn't measured by how many operatives it can deploy, but by how well-developed its cover story management system is.