Claude Code Internals

How the memory system actually works, what the harness controls, and where the gaps are.

Contents

The Two Memory Systems

Claude Code has two complementary memory mechanisms. Neither involves a database, vector store, or persistent internal state. Claude re-reads plain markdown files every session.

PropertyCLAUDE.mdAuto Memory (MEMORY.md)
AuthorUserClaude
ContainsInstructions and rulesLearnings and patterns
ScopeProject, user, or orgPer working tree / git repo
LoadedEvery session, in fullFirst 200 lines or 25KB
There is no magic "Claude (and probably all LLM systems) does not 'remember' in a human sense. It re-reads instructions, every time." The entire memory system is file-read-inject-into-context. No embeddings, no retrieval, no learning.

CLAUDE.md Loading Chain

CLAUDE.md files load in a strict hierarchy, from broadest to most specific scope. More specific locations take precedence:

  1. Managed policy (organization-wide, cannot be excluded):
  2. User-level: ~/.claude/CLAUDE.md — personal preferences across all projects
  3. User-level rules: ~/.claude/rules/*.md — loaded before project rules
  4. Project-level: ./CLAUDE.md or ./.claude/CLAUDE.md — team-shared via version control
  5. Project rules: ./.claude/rules/*.md — can have paths: frontmatter for conditional loading
  6. Subdirectory CLAUDE.md files: discovered but only loaded on-demand when Claude reads files in those directories (lazy loading)
  7. Local overrides: ./CLAUDE.local.md — personal, not committed to git

The harness walks up the directory tree from cwd, loading CLAUDE.md from each ancestor. Files are loaded in full regardless of length, though Anthropic recommends keeping them under 200 lines for best adherence.

Import System

CLAUDE.md files support @path/to/file syntax to import additional files. Imports resolve relative to the importing file, support recursive imports (max depth 5), and expand at launch time. HTML comments are stripped before injection to save tokens.

Compaction Behavior

CLAUDE.md fully survives compaction. After /compact, Claude re-reads CLAUDE.md from disk and re-injects it fresh. Instructions given only in conversation (not written to CLAUDE.md) may be lost.

The Rules System

Rules are modular instruction files in .claude/rules/. They support YAML frontmatter with a paths field for conditional loading:

---
paths:
  - "src/api/**/*.ts"
---
# API Development Rules
Always validate input at the handler level...

Rules without paths frontmatter load unconditionally at launch. Path-scoped rules trigger when Claude reads files matching the glob pattern. Rules are re-injected as system-reminders every time Claude accesses a matching file — unlike CLAUDE.md which loads once.

Key insight for memory routing Path-scoped rules are the most reliable routing mechanism Claude Code offers. They're mechanically enforced by the harness, not dependent on Claude "remembering" to follow instructions.

Auto Memory (MEMORY.md)

Storage location: ~/.claude/projects/<project>/memory/ where <project> is derived from the git repository root. All worktrees and subdirectories within the same repo share one auto memory directory. Outside git repos, the project root path is used.

~/.claude/projects/<project>/memory/
├── MEMORY.md          # Concise index, loaded every session
├── debugging.md       # Topic file (loaded on demand)
├── api-conventions.md # Topic file (loaded on demand)
└── ...
Critical constraint (source-verified) Only the first 200 lines OR 25KB of MEMORY.md (whichever comes first) loads at session start. When truncated, a WARNING is appended explaining which cap fired. Topic files are never loaded at startup — they're surfaced on demand by the memory selection agent.

Per-File Recall Limits (source-verified)

When topic files ARE surfaced, they're also truncated:

LimitValueScope
Per-file lines200 linesEach surfaced memory file
Per-file bytes4,096 bytes (~4KB)Each surfaced memory file
Session total60 KBAll surfaced memories combined
Max files scanned200 filesMemory directory scan limit

Truncated files get a note pointing to the full path for FileRead.

The Memory Selection Agent

Claude Code uses Claude Sonnet as a dedicated sub-agent for memory selection. Its system prompt reads:

"You are selecting memories that will be useful to Claude Code as it processes a user's query. You will be given the user's query and a list of available memory files with their filenames and descriptions. Return a list of filenames for the memories that will clearly be useful (up to 5). Only include memories that you are certain will be helpful based on their name and description."

How it works (source-verified)

The type field in frontmatter is used for display in the manifest ([feedback] filename) but is NOT used for filtering. Only filenames and descriptions influence selection.

Dream: Memory Consolidation

Claude Code includes a "dream" system (related to the KAIROS feature flag) that performs memory consolidation as a background agent with four phases:

  1. Orientls the memory directory, read the index, skim topic files
  2. Gather recent signal — Check daily logs, find drifted memories, grep transcripts narrowly
  3. Consolidate — Merge new signal into existing topic files, convert relative dates to absolute, delete contradicted facts
  4. Prune and index — Keep MEMORY.md under the line/size limit, ensure each entry is one line under ~150 chars

Gate System (source-verified)

Dream runs when ALL gates pass (cheapest checked first):

  1. Time gate: minHours since last consolidation (default: 24h)
  2. Session gate: minSessions transcript files touched since last run (default: 5)
  3. Lock gate: PID-based lock file (.consolidate-lock) with 1-hour stale threshold

Feature-gated via tengu_onyx_plover. User override: autoDreamEnabled in settings.json.

The Harness Architecture

Claude Code is the agentic harness around Claude. The harness provides tools, context management, and execution environment that turn a language model into a coding agent. The model reasons; the harness acts.

What the Harness Controls

ComponentToken CostDetails
System prompt~2,300-3,600 tokensCore instructions, behavior rules
Tool definitions~14-17K tokens18+ built-in tools, deferred MCP tools
CLAUDE.md contentVariesInjected as system prompt context section (NOT a user message)
MEMORY.md indexFirst 200 linesAlways loaded at session start
Selected memory filesUp to 5 filesChosen by memory selection agent
RulesVariesUnconditional at start, path-scoped on demand
Hook stdoutVariesSessionStart output added to context

Session Memory (separate system, source-verified)

Distinct from auto-memory, session memory runs as a periodic background forked sub-agent that extracts key information into ~/.claude/projects/<slug>/.session/memory.md. Triggered by token thresholds (~30K init, ~20K between updates). Does not interrupt the main conversation.

Memory Extraction (source-verified)

Auto-memory saving runs as a forked agent (shares prompt cache with main agent) at end of each query loop. It has restricted tools: Read/Grep/Glob, read-only Bash, and FileEdit/FileWrite only within the memory directory. Skipped if the main agent already wrote to memory that turn.

Prompt Cache Optimization

The harness actively optimizes for prompt caching. promptCacheBreakDetection.ts uses a 2-phase system: pre-call state recording (hashing system prompt, tool schemas, model, betas) and post-call cache break detection (>5% drop in cache read tokens). "Sticky latches" prevent mode toggles from invalidating the cache. System prompt content before a dynamic boundary marker gets global cache scope; content after gets org-level or no caching.

Current Limitations

1. No Intelligent Semantic Routing

The memory selection agent operates only on filenames and one-line descriptions. No semantic search over memory file contents. If a memory file's index entry doesn't clearly match the query by keyword, it won't be selected.

2. All-or-Nothing Loading at Session Start

At session start, the harness loads all ancestor CLAUDE.md files in full, all unconditional rules, and the first 200 lines of MEMORY.md. This happens regardless of whether the task is about deployment, debugging, Svelte, or CV writing.

3. No Memory Decay or Relevance Scoring

All memory entries have equal weight. A debugging insight from 6 months ago occupies the same space as one from yesterday.

4. Static Files, Not a Knowledge Graph

No relationships, no tags beyond the filename, no cross-references, no structured metadata.

5. No Cross-Session State

Each session starts with a fresh context window. The only bridges across sessions are CLAUDE.md and auto memory files.

6. Community-Identified Failure Modes

Community Solutions

ProjectApproach
claude-code-vector-memorySemantic memory via vector search of session summaries
claude-memChromaDB + MCP integration, automatic compression, ~10x token efficiency
claude-code-memoryKnowledge graphs + Tree-sitter + Qdrant vector search
claude-contextCode search MCP, ~40% token reduction