Why Claude Code's Edit Tool Doesn't Mangle Your Files

Posted on 2026-04-05 Edited on 2026-04-06 In NLP 评论: Views:

Claude Code's Edit tool has a deceptively simple interface: give it an old_string, give it a new_string, and it finds the former in a file and replaces it with the latter. Sounds like nothing more than a str.replace(). But in the context of an LLM Agent, this seemingly trivial operation is backed by an entire engineering pipeline spanning string sanitization to concurrency safety. The model stuffs line numbers into its replacement strings. It conjures curly quotes out of thin air. External tools modify the target file while the user is still reviewing the permission dialog. The Edit tool has to stay correct through all of this — far more than find-and-replace can handle.

From observing its behavior, the Edit tool's execution breaks down into three phases: API-layer preprocessing (before the tool even receives input), input validation (before the permission dialog is shown), and the actual write (after the user approves). Each phase handles a distinct class of problems and maintains deliberate sync/async boundaries.

Desanitization: The Model Doesn't See the File's True Contents

When the Read tool returns file contents, it prepends line numbers in a cat -n style format. In compact mode this looks like 42\tfunction foo(); in standard mode it's a six-digit right-aligned number with an arrow symbol. When the model constructs its old_string, it frequently copies these line number prefixes along with the content. This isn't a model bug: line numbers are part of the context it sees, so there's no reason it wouldn't copy them.

The line number issue is handled by regex-matching both prefix formats and stripping them. But beyond line numbers lies a subtler problem: desanitization.

Claude's API sanitizes certain tags before returning tool results to the model, abbreviating some XML tags into short forms and even truncating specific newline-plus-keyword combinations. This prevents these tokens in model output from being misinterpreted as protocol control directives. When the model tries to edit a file that happens to contain these tags, it can only output the abbreviated form, because that's all it ever saw.

Sanitization (API -> Model):         Desanitization (Model -> Tool):

<function_results> --> <fnr>          <fnr> --> <function_results>
<system>           --> <s>            <s>  --> <system>
\n\nHuman:         --> \n\nH:         \n\nH: --> \n\nHuman:

The Edit tool's solution is to apply a reverse mapping to old_string and new_string in the API preprocessing pipeline. This happens before the tool ever receives the input:

def normalize_file_edit_input(tool_input):
    old_str = tool_input["old_string"]
    new_str = tool_input["new_string"]

    # Step 1: desanitize XML tags
    for short, full in DESANITIZATION_MAP.items():
        old_str = old_str.replace(short, full)
        new_str = new_str.replace(short, full)

    # Step 2: strip line number prefixes
    old_str = strip_line_number_prefix(old_str)
    new_str = strip_line_number_prefix(new_str)

    return {**tool_input, "old_string": old_str, "new_string": new_str}

This processing completes before the tool receives its input. The tool layer is completely unaware of it. From the tool's perspective, input is always clean.

Curly Quotes: A Character Mismatch You Never Thought Of

The model sometimes produces curly quotes (" " ' ') instead of straight quotes (" '). The root cause may lie in the training data or tokenizer mappings, but for the Edit tool the problem is concrete: if the file contains const name = "hello" but the model's old_string says const name = \u201chello\u201d, the exact match fails.

The Edit tool handles this in two steps:

def find_actual_string(file_content, old_string):
    # Step 1: try exact match
    if old_string in file_content:
        return old_string

    # Step 2: normalize curly quotes and retry
    normalized_old = normalize_quotes(old_string)
    normalized_content = normalize_quotes(file_content)
    if normalized_old in normalized_content:
        return find_original_span(file_content, normalized_content, normalized_old)

    return None  # truly not found

When a match succeeds via quote normalization, the new_string also needs reverse treatment: preserveQuoteStyle() converts straight quotes in new_string back to whichever curly quote style the file originally used, keeping the file's quoting convention consistent. This detail means the Edit tool doesn't just get the content right — it preserves code style too.

Race Condition Defense: Two Checks With Different Purposes

The Edit tool faces a classic TOCTOU (Time of Check to Time of Use) problem: the model reads the file at turn 1, issues an Edit command at turn 2, but seconds or even minutes may have elapsed in between. During that window, a linter might have auto-formatted the file, the user might have edited it manually, or another parallel Agent might have modified the same file.

Claude Code uses two checks to handle this, each serving a different purpose:

Timeline:
  Model reads file
       |
       v
  [Time passes: linter runs, user edits, ...]
       |
       v
  Model sends Edit command
       |
       v
  +---CHECK 1: validateInput() (async)-----+
  |  Compare mtime vs last read timestamp  |
  |  Purpose: UX guard                     |
  |  -> Don't show stale permission dialog |
  +----------------------------------------+
       |
       v
  [Permission dialog shown to user]
  [User reviews diff, clicks approve]
  [More time passes...]
       |
       v
  +---CHECK 2: call() (sync, atomic)-------+
  |  Compare mtime + content again         |
  |  Purpose: Data integrity guard         |
  |  -> No async ops between check & write |
  +----------------------------------------+
       |
       v
  Write to disk

Check 1 runs before the permission dialog is shown. Its purpose is user experience: if the file has already changed, don't waste the user's time reviewing a diff that's doomed to fail. This check is asynchronous and compares the file's mtime against the last read timestamp.

Check 2 runs after the user approves but before the actual write. Its purpose is data integrity. A code comment explicitly warns: "Please avoid async operations between here and writing to disk to preserve atomicity." No await is allowed between Check 2 and writeTextContent(), ensuring there's no gap where execution could yield between the check and the write.

The time window between the two checks can be long; the user might walk away for minutes before clicking approve. The file could easily change during that time, which is why the second check is essential.

There's also a Windows-specific edge case: cloud sync services (OneDrive, Dropbox) and antivirus software frequently touch a file's mtime even when its content hasn't changed. Naively comparing mtimes would produce a flood of false positives. So both checks include fallback logic: if the mtime has changed but the file was fully read (no offset/limit), compare the actual content. If the content is identical, it's treated as a false positive and allowed through.

def check_staleness(file_path, read_state):
    current_mtime = get_file_mtime(file_path)
    if current_mtime > read_state.timestamp:
        # mtime changed — but is content actually different?
        if read_state.is_full_read:
            current_content = read_file_sync(file_path)
            if current_content == read_state.content:
                return False  # false positive (cloud sync, antivirus)
        return True  # genuinely stale
    return False

Read Before You Write: A Deliberate Constraint

The Edit tool enforces a hard precondition: you must have read a file before you can edit it. If readFileState has no record of the target file, the tool refuses outright and returns an error telling the model to use the Read tool first.

This constraint might seem redundant; in practice the model almost always reads before editing. But it guards against a subtle failure mode: the model sometimes "remembers" file contents from earlier in the conversation history and skips the Read, jumping straight to an Edit. If the file has been modified since then, the model's memory is stale, and the edit may be based on wrong assumptions.

The rule is even stricter than that: if the Read used offset/limit (reading only part of the file), the Edit tool also refuses. A partial read means the model hasn't seen the full file context: its old_string might not be unique, or it may lack awareness of the surrounding code at the edit location.

This constraint works in concert with the race condition checks to form a closed loop: Read establishes a timestamp baseline, race condition checks verify the baseline is still valid, and the write executes immediately after validation passes. The contract between these three steps is strict.

Encoding and Line Ending Preservation

A file's encoding and line ending style are transparently preserved throughout the edit process. On read, the original encoding (UTF-8 or UTF-16LE) and line endings (LF or CRLF) are detected. Internally, everything is normalized to LF for matching; on write, the original format is restored.

The model only ever sees LF-terminated content and never needs to know whether the target file uses Windows-style or Unix-style line endings. This eliminates a common class of editing errors: the model inserting LF lines into a CRLF file, producing mixed line endings.

There's one more small but thoughtful detail: when new_string is empty (a deletion) and old_string doesn't end with a newline, but a newline immediately follows old_string in the file, the system deletes that trailing newline as well. This prevents a blank line from being left behind after removing a line of content.

The Counterintuitive Aspects of the Edit Tool

Looking back, the most interesting thing about the Edit tool isn't what it does — it's which layer it chooses to do it in.

Desanitization lives in the API layer, not the tool layer. This means the tool never needs to know sanitization exists; concerns are strictly separated. Race condition checks happen at two different stages with entirely different goals: one optimizes user experience, the other safeguards data integrity. Curly quote handling normalizes then de-normalizes, ensuring correct match semantics while preserving the file's style.

The common thread across these decisions is that they all answer the same question: the model is right about its editing intent, but its literal expression may be slightly off. The system's job is to bridge that gap without disturbing any existing file state.

For a tool invoked millions of times a day, these edge cases aren't theoretical possibilities; they happen daily. What a str.replace() can't handle is precisely the distance between "it works" and "it's reliable" in code editing.