Context Management in Claude Code vs OpenClaw

Posted on 2026-04-07 In NLP 评论: Views:

After OpenClaw crossed 350K stars, a narrative started forming in the community: since both run on Opus 4.6 under the hood, the open-source option should be on par with Claude Code. Anyone who has actually used both probably shares the same observation — in long sessions, OpenClaw starts losing context, forgetting files it already read, redoing work it already did. Claude Code does too, but noticeably later, and it recovers much better.

Same model, different experience. Why?

After reading through both codebases, the gap isn't in model capability — it's in how each Agent framework manages the 200K context window. Three core differences:

Claude Code has a four-layer compression cascade where the first three layers are free; OpenClaw has one layer that always calls the LLM
Claude Code continuously maintains session notes in the background and uses them as free summaries during compression; OpenClaw only archives sessions on exit
Claude Code's sub-agents are role-specialized so search results don't pollute the main thread's context; OpenClaw's sub-agents are a generic framework with no scenario-specific isolation

It boils down to fundamentally different engineering choices on "how to compress," "whether compression costs money," and "how to isolate context consumption."

Compression Layers Define the Experience Ceiling

Claude Code's context compression is a four-layer cascade. It starts with the cheapest approach and only escalates when the previous layer can't handle it:

Cost
 ^
 |           * Full Compact ($$$, LLM summary)
 |
 |       * Session Memory ($0, background notes)
 |
 |    * Cached Microcompact (~$0, cache edit API)
 |
 |  * Time-Based MC ($0, content clearing)
 |
 +------------------------------------------>
                           Compression Depth

OpenClaw has one layer: call the LLM directly. It chunks messages, summarizes each chunk with the model, and if there are multiple chunks, runs another LLM call to merge the partial summaries. Every compression costs at least one LLM call, usually two or three.

Side-by-side comparison of what happens after compression triggers:

Claude Code                          OpenClaw
-----------                          --------
tokens > 167K?                       tokens > threshold?
  |                                    |
  v                                    v
Try Session Memory ($0)              chunk messages
  |  fail                              |
  v                                    v
Run microcompact ($0)                call LLM per chunk ($$$)
  |                                    |
  v                                    v
Full Compact via fork ($$$)          merge summaries ($$$)
  |                                    |
  v                                    v
3 failures? stop retrying            done

Claude Code's first two layers don't call the LLM at all. Layer one is "time-based clearing": when the user has been away for more than 60 minutes, it replaces old tool results with a placeholder. The logic is straightforward — Anthropic's server-side cache TTL is one hour. If you've been gone that long, the cache is already cold, and the entire prefix needs to be rewritten anyway. Might as well clean out old content while you're at it.

Layer two is more elegant: it uses Anthropic's cache editing API to delete old tool results directly from the server-side cache without modifying local messages at all. You save tokens, but the cache prefix stays intact — no cache miss penalty. This is an optimization only possible with deep Anthropic API integration.

OpenClaw supports over 20 providers, so this kind of vendor-specific optimization isn't feasible. That's the cost of an architectural choice: the trade-off between generality and depth.

Session Memory: The Foundation for Free Summaries

Claude Code's most interesting design is Session Memory. It continuously maintains a structured notes file in the background, periodically extracting key information from the current session into markdown. The notes cover the current task, important files, workflows, errors encountered, key conclusions, and more.

When compression is needed, it uses these notes directly as the summary — no LLM call required. It's like taking notes during a meeting: when the meeting ends, you don't need to recall everything from memory. Just read your notes.

Timeline     Claude Code                    OpenClaw
--------     ----------                     --------
Turn 1       [work normally]                [work normally]
Turn 5       bg: extract notes  <--+
Turn 10      bg: update notes      |        (nothing)
Turn 15      bg: update notes      |
  ...           ...                |
Turn 30      context full!         |
             compress:             |
               summary = notes  ---+  $0    compress:
               done!                          call LLM  $$$
                                              done
Turn 31      [keep working]                 [keep working]

# Claude Code: session memory compaction (simplified)
def session_memory_compact(messages, last_summarized_id):
    notes = read_session_memory_file()
    if is_empty_template(notes):
        return None  # notes not ready, fall back to LLM

    messages_to_keep = messages[calculate_keep_index():]
    return CompactionResult(
        summary=notes,          # FREE! no LLM call
        kept=messages_to_keep
    )

OpenClaw also has a mechanism called session-memory, but it does something entirely different: it only triggers when the user runs /new or /reset, saving the entire session to a memory file in one shot. This is session archival, not real-time note maintenance. During an active session, it does no background extraction whatsoever.

The result: Claude Code can complete most compressions without any LLM calls, while OpenClaw pays for every single one. The cheaper compression is, the more aggressively you can compress, and the less likely context will balloon to the breaking point before you act. It's a positive feedback loop.

The Recovery Gap After Compression

Compression isn't just about deleting old messages. After compression, the model's context contains only the summary and a few retained messages. File states, loaded tool instructions, plan contents — all gone. Without recovery, the model's first action will almost certainly be re-reading files it was just looking at.

After compaction:

Claude Code                          OpenClaw
-----------                          --------
[summary]                            [summary]
[kept messages]                      [kept messages]
  + re-inject recent files (top 5)     + repair message pairing
  + reload CLAUDE.md & config          (done)
  + restore skill content
  + clear stale caches
  + reset prompt cache baseline

Claude Code performs a detailed state recovery after compression: re-injecting content from the 5 most recently accessed files, reloading config files and skill content, and clearing various internal caches to force reinitialization. The code comments specifically note that skill content is intentionally not cleared, because it needs to survive across multiple compressions.

OpenClaw's post-compression processing mainly repairs message pairing relationships to keep API calls from erroring. This is necessary, but it only solves the "correct format" problem, not the "model recovers its working context" problem.

Sub-Agent Role Specialization

Claude Code's sub-agents have clear division of labor:

+------+  search task  +-------------------+
| Main |-------------->| Explore Agent     |
| Loop |               | - Haiku (fast)    |
|      |               | - read-only       |
|      |<--------------| - returns summary |
|      |  summary only +-------------------+
|      |
|      |  bg extract   +-------------------+
|      |-------------->| Session Memory    |
|      |               | - only Edit notes |
|      |               | - shares cache    |
|      |               +-------------------+
|      |
|      |  compact      +-------------------+
|      |-------------->| Compact Agent     |
|      |               | - NO tools        |
|      |               | - shares cache    |
+------+               +-------------------+

The Explore Agent is deliberately constrained: it can't write files, can't spawn more sub-agents, and uses Haiku (the fastest and cheapest model) for external users. All search results stay in the sub-agent's own context, with only a distilled summary returned to the main thread. This solves a core problem: search processes consume massive amounts of context, and if everything stays in the main thread, a few rounds of searching will fill the window.

OpenClaw has a more feature-complete sub-agent framework: two execution modes, sandbox inheritance, cross-process communication, recursion depth limits, and orphan recovery. But it's a generic framework without specialization for scenarios like "search without polluting the main context."

The problem with generic frameworks: when every sub-agent is general-purpose, no sub-agent is specifically optimized. Claude Code built a few surgical instruments for the coding scenario; OpenClaw provides a universal toolkit.

Prompt Cache: The Overlooked Battlefield

Comparing both codebases, what strikes me most is Claude Code's obsession with prompt cache. Almost every design decision's comments include consideration of "will this break the cache."

API request with prompt cache:

Request tokens: [system + tools + history + new turn]
                 |_______________|
                    cached prefix    --> cache hit:  $0.50/MTok
                                     --> cache miss: $5/MTok (10x)

Claude Code: everything is designed to keep the prefix stable
OpenClaw:    no prompt cache management (multi-provider)

For example, the forked agent used for compression inherits the main session's complete tool set — not because compression needs tools, but because the tool list is part of the cache key, and any mismatch causes a cache miss. Cache editing only runs on the main thread because sub-agent tool IDs would pollute the global state. After compression completes, the cache monitoring module is notified to reset its baseline, preventing compression-induced cache hit rate drops from being flagged as anomalies.

OpenClaw's context management is about discovering and caching model context window sizes — an entirely different concern from prompt cache management. With 20+ providers to support, deep optimization on any single provider's caching mechanism isn't practical.

In a world where you pay per token, prompt cache hit rates directly impact cost. Claude Code's effective per-call cost may be a fraction of OpenClaw's, because most input tokens are read from cache. The savings feed back into more frequent background extraction and earlier compression.

Context Engine: Architecture Without Implementation

To be fair, OpenClaw's context engine framework is well-designed. The interface defines seven lifecycle methods, supports safe transcript rewriting, and the registry distinguishes between core and third-party engines — third parties can't override the core engine. It's a complete architecture for pluggable context management.

OpenClaw Context Engine interface vs reality:

Interface (well designed)       Actual LegacyContextEngine
-------------------------       --------------------------
bootstrap()                     (not implemented)
ingest(message)            -->  return {ingested: false}
assemble(messages, budget) -->  return {messages: messages}
compact(params)            -->  delegate to old path
afterTurn(messages)        -->  no-op
maintain()                      (not implemented)
prepareSubagentSpawn()          (not implemented)

The problem is that the engine currently running is almost entirely hollow: message ingestion is a no-op, context assembly is a pass-through, post-turn processing does nothing, compression delegates to the legacy path. The architectural capability is there, but the implementation is still single-layer LLM summarization. The frame is built, but living in it feels like bare concrete.

The Root Difference

Back to the three differences from the opening: four-layer cascade vs single-layer summarization, real-time notes vs end-of-session archival, specialized sub-agents vs generic framework. These aren't differences in engineering taste — they're driven by product positioning.

Claude Code is optimized for "one developer working on one codebase," free to pursue Anthropic API-specific optimizations, forked agent cache sharing, and fine-grained cleanup strategies by tool type. OpenClaw covers "multiple users, multiple channels, multiple models" — it handles Telegram, Discord, Slack, WhatsApp, voice synthesis, cross-process communication, multi-account rotation, and model degradation. The latter is more complex overall, but the former goes deeper on the specific scenario it targets.

The good news is that OpenClaw's context engine plugin architecture is already in place — complete interface, working registry. If it could absorb Claude Code's layered compression approach, even just implementing Session Memory and time-based clearing, long session experience would improve dramatically. The framework is ready. What's missing is the fill.