Claude Code Memory Management

A comprehensive analysis of AI coding assistant memory systems, the memory routing problem, and how to build an optimal harness. April 2026. Updated with source code verification.

The Core Problem A fresh Claude Code instance loads your CLAUDE.md and MEMORY.md index, but does not intelligently route to the right memory files based on your task. You end up being the memory manager — telling it "read svelte5-pitfalls.md" or "check hetzner-deploy.md" — when the harness should handle this automatically. This report maps the problem, the landscape, and the solutions.

Report Sections

1. Claude Code Internals

How the memory system actually works: CLAUDE.md loading chain, auto-memory, the memory selection agent, KAIROS/dream consolidation, and what the harness controls.

2. Your Setup Audit

Inventory of your 30 memory files, trigger coverage analysis, routing gaps, scaling concerns, and what a fresh Claude instance will miss.

3. The Memory Routing Problem

The core challenge: automatically loading the right context at the right time. Approaches from keyword matching to semantic search to LLM re-ranking.

4. Hooks & Automation

Complete reference for Claude Code hooks, settings, MCP servers, path-scoped rules, and practical patterns for memory augmentation.

5. Competing Tools Comparison

How Cursor, Windsurf, Copilot, Aider, Continue, Cline/Roo, Codex CLI, Amazon Q, and Zed handle memory and context.

6. AI Memory Architectures

State of the art: memory types, RAG for code, knowledge graphs, memory platforms (Mem0, Zep, Hindsight, Letta), and the MemGPT paradigm.

7. Recommendations & Action Plan

Concrete, phased plan: from immediate CLAUDE.md fixes (30 min) to a full semantic memory routing pipeline (weekend project).

8. Source Code Findings

Corrections from reading the actual source: hard-coded constants, memory data flow, system prompt assembly order, feature gates, and key file references.

Key Findings

Loading everything is worse than loading nothing Research from Harvard's D3 Institute and Adaptive Memory Admission Control papers shows that indiscriminate memory loading degrades LLM performance compared to using zero memory. Bad or irrelevant memories create a "propagating error feedback loop."
Claude Code has a memory selection agent As of v2.1.74, a sub-agent selects up to 5 memory files by matching filenames and one-line descriptions against the user's query. But it operates on metadata only — no semantic search over content. And your behavioral feedback rules have no routing mechanism at all.
The fix is a pipeline, not a single technique The ideal system: fast keyword match (<5ms) → embedding search (<100ms) → optional LLM re-rank (<500ms) → dynamic expansion via MCP. Your RTX 3060 can run local embeddings with sub-50ms latency.
MetricYour Current SetupAfter Recommendations
Memory files30 files, ~51 KB~24 files after consolidation
Files with explicit triggers7 of 30 (23%)15+ of 24 (62%+)
Behavioral rules routed0 of 13 (0%)5 inlined + rest triggered
Routing methodLLM reads MEMORY.md one-linersAlways-on rules + keyword hooks + semantic search
User intervention neededOften ("read X.md")Rarely (pipeline handles it)