Claude Code Internals

How the memory system actually works, what the harness controls, and where the gaps are.

The Two Memory Systems
CLAUDE.md Loading Chain
The Rules System
Auto Memory (MEMORY.md)
The Memory Selection Agent
Dream: Memory Consolidation
The Harness Architecture
Current Limitations

The Two Memory Systems

Claude Code has two complementary memory mechanisms. Neither involves a database, vector store, or persistent internal state. Claude re-reads plain markdown files every session.

Property	CLAUDE.md	Auto Memory (MEMORY.md)
Author	User	Claude
Contains	Instructions and rules	Learnings and patterns
Scope	Project, user, or org	Per working tree / git repo
Loaded	Every session, in full	First 200 lines or 25KB

There is no magic "Claude (and probably all LLM systems) does not 'remember' in a human sense. It re-reads instructions, every time." The entire memory system is file-read-inject-into-context. No embeddings, no retrieval, no learning.

CLAUDE.md Loading Chain

CLAUDE.md files load in a strict hierarchy, from broadest to most specific scope. More specific locations take precedence:

Managed policy (organization-wide, cannot be excluded):
- Linux/WSL: /etc/claude-code/CLAUDE.md
- macOS: /Library/Application Support/ClaudeCode/CLAUDE.md
User-level: ~/.claude/CLAUDE.md — personal preferences across all projects
User-level rules: ~/.claude/rules/*.md — loaded before project rules
Project-level: ./CLAUDE.md or ./.claude/CLAUDE.md — team-shared via version control
Project rules: ./.claude/rules/*.md — can have paths: frontmatter for conditional loading
Subdirectory CLAUDE.md files: discovered but only loaded on-demand when Claude reads files in those directories (lazy loading)
Local overrides: ./CLAUDE.local.md — personal, not committed to git

The harness walks up the directory tree from cwd, loading CLAUDE.md from each ancestor. Files are loaded in full regardless of length, though Anthropic recommends keeping them under 200 lines for best adherence.

Import System

CLAUDE.md files support @path/to/file syntax to import additional files. Imports resolve relative to the importing file, support recursive imports (max depth 5), and expand at launch time. HTML comments are stripped before injection to save tokens.

Compaction Behavior

CLAUDE.md fully survives compaction. After /compact, Claude re-reads CLAUDE.md from disk and re-injects it fresh. Instructions given only in conversation (not written to CLAUDE.md) may be lost.

The Rules System

Rules are modular instruction files in .claude/rules/. They support YAML frontmatter with a paths field for conditional loading:

---
paths:
  - "src/api/**/*.ts"
---
# API Development Rules
Always validate input at the handler level...

Rules without paths frontmatter load unconditionally at launch. Path-scoped rules trigger when Claude reads files matching the glob pattern. Rules are re-injected as system-reminders every time Claude accesses a matching file — unlike CLAUDE.md which loads once.

Key insight for memory routing Path-scoped rules are the most reliable routing mechanism Claude Code offers. They're mechanically enforced by the harness, not dependent on Claude "remembering" to follow instructions.

Auto Memory (MEMORY.md)

Storage location: ~/.claude/projects/<project>/memory/ where <project> is derived from the git repository root. All worktrees and subdirectories within the same repo share one auto memory directory. Outside git repos, the project root path is used.

~/.claude/projects/<project>/memory/
├── MEMORY.md          # Concise index, loaded every session
├── debugging.md       # Topic file (loaded on demand)
├── api-conventions.md # Topic file (loaded on demand)
└── ...

Critical constraint (source-verified) Only the first 200 lines OR 25KB of MEMORY.md (whichever comes first) loads at session start. When truncated, a WARNING is appended explaining which cap fired. Topic files are never loaded at startup — they're surfaced on demand by the memory selection agent.

Per-File Recall Limits (source-verified)

When topic files ARE surfaced, they're also truncated:

Limit	Value	Scope
Per-file lines	200 lines	Each surfaced memory file
Per-file bytes	4,096 bytes (~4KB)	Each surfaced memory file
Session total	60 KB	All surfaced memories combined
Max files scanned	200 files	Memory directory scan limit

Truncated files get a note pointing to the full path for FileRead.

The Memory Selection Agent

Claude Code uses Claude Sonnet as a dedicated sub-agent for memory selection. Its system prompt reads:

"You are selecting memories that will be useful to Claude Code as it processes a user's query. You will be given the user's query and a list of available memory files with their filenames and descriptions. Return a list of filenames for the memories that will clearly be useful (up to 5). Only include memories that you are certain will be helpful based on their name and description."

How it works (source-verified)

Input: User query + list of memory file headers (filename, description, type, mtime)
Output: Array of up to 5 filenames (hard-coded max)
De-duplication: alreadySurfaced set prevents re-selecting files shown in prior turns
Tool awareness: If a tool is in recentTools, usage reference docs are deprioritized, but warnings/gotchas are kept
Injected as: <system-reminder> attachments (not in main conversation)

The type field in frontmatter is used for display in the manifest ([feedback] filename) but is NOT used for filtering. Only filenames and descriptions influence selection.

Dream: Memory Consolidation

Claude Code includes a "dream" system (related to the KAIROS feature flag) that performs memory consolidation as a background agent with four phases:

Orient — ls the memory directory, read the index, skim topic files
Gather recent signal — Check daily logs, find drifted memories, grep transcripts narrowly
Consolidate — Merge new signal into existing topic files, convert relative dates to absolute, delete contradicted facts
Prune and index — Keep MEMORY.md under the line/size limit, ensure each entry is one line under ~150 chars

Gate System (source-verified)

Dream runs when ALL gates pass (cheapest checked first):

Time gate: minHours since last consolidation (default: 24h)
Session gate: minSessions transcript files touched since last run (default: 5)
Lock gate: PID-based lock file (.consolidate-lock) with 1-hour stale threshold

Feature-gated via tengu_onyx_plover. User override: autoDreamEnabled in settings.json.

The Harness Architecture

Claude Code is the agentic harness around Claude. The harness provides tools, context management, and execution environment that turn a language model into a coding agent. The model reasons; the harness acts.

What the Harness Controls

Component	Token Cost	Details
System prompt	~2,300-3,600 tokens	Core instructions, behavior rules
Tool definitions	~14-17K tokens	18+ built-in tools, deferred MCP tools
CLAUDE.md content	Varies	Injected as system prompt context section (NOT a user message)
MEMORY.md index	First 200 lines	Always loaded at session start
Selected memory files	Up to 5 files	Chosen by memory selection agent
Rules	Varies	Unconditional at start, path-scoped on demand
Hook stdout	Varies	SessionStart output added to context

Session Memory (separate system, source-verified)

Distinct from auto-memory, session memory runs as a periodic background forked sub-agent that extracts key information into ~/.claude/projects/<slug>/.session/memory.md. Triggered by token thresholds (~30K init, ~20K between updates). Does not interrupt the main conversation.

Memory Extraction (source-verified)

Auto-memory saving runs as a forked agent (shares prompt cache with main agent) at end of each query loop. It has restricted tools: Read/Grep/Glob, read-only Bash, and FileEdit/FileWrite only within the memory directory. Skipped if the main agent already wrote to memory that turn.

Prompt Cache Optimization

The harness actively optimizes for prompt caching. promptCacheBreakDetection.ts uses a 2-phase system: pre-call state recording (hashing system prompt, tool schemas, model, betas) and post-call cache break detection (>5% drop in cache read tokens). "Sticky latches" prevent mode toggles from invalidating the cache. System prompt content before a dynamic boundary marker gets global cache scope; content after gets org-level or no caching.

Current Limitations

1. No Intelligent Semantic Routing

The memory selection agent operates only on filenames and one-line descriptions. No semantic search over memory file contents. If a memory file's index entry doesn't clearly match the query by keyword, it won't be selected.

2. All-or-Nothing Loading at Session Start

At session start, the harness loads all ancestor CLAUDE.md files in full, all unconditional rules, and the first 200 lines of MEMORY.md. This happens regardless of whether the task is about deployment, debugging, Svelte, or CV writing.

3. No Memory Decay or Relevance Scoring

All memory entries have equal weight. A debugging insight from 6 months ago occupies the same space as one from yesterday.

4. Static Files, Not a Knowledge Graph

No relationships, no tags beyond the filename, no cross-references, no structured metadata.

5. No Cross-Session State

Each session starts with a fresh context window. The only bridges across sessions are CLAUDE.md and auto memory files.

6. Community-Identified Failure Modes

Manual notes are lossy: They capture what the user thought was important, not what Claude found important (~40% of useful context is missed)
Semantic knowledge without state: Memory saves decisions but not the code context from when decisions were made
Passive retrieval: Memory requires manual review rather than automatic context injection

Community Solutions

Project	Approach
`claude-code-vector-memory`	Semantic memory via vector search of session summaries
`claude-mem`	ChromaDB + MCP integration, automatic compression, ~10x token efficiency
`claude-code-memory`	Knowledge graphs + Tree-sitter + Qdrant vector search
`claude-context`	Code search MCP, ~40% token reduction

Claude Code Internals

Contents

The Two Memory Systems

CLAUDE.md Loading Chain

Import System

Compaction Behavior

The Rules System

Auto Memory (MEMORY.md)

Per-File Recall Limits (source-verified)

The Memory Selection Agent

How it works (source-verified)

Dream: Memory Consolidation

Gate System (source-verified)

The Harness Architecture

What the Harness Controls

Session Memory (separate system, source-verified)

Memory Extraction (source-verified)

Prompt Cache Optimization

Current Limitations

1. No Intelligent Semantic Routing

2. All-or-Nothing Loading at Session Start

3. No Memory Decay or Relevance Scoring

4. Static Files, Not a Knowledge Graph

5. No Cross-Session State

6. Community-Identified Failure Modes

Community Solutions