The Memory Routing Problem

Automatically loading the right context at the right time, without user intervention.

The Problem

You have N memory files covering different projects and topics. A new session starts. The harness needs to decide which memories are relevant without you telling it. Current approaches just dump everything into context or rely on you to say "read X.md".

Loading everything is worse than loading nothing Research from Harvard's D3 Institute and Adaptive Memory Admission Control papers: indiscriminate memory loading degrades performance compared to zero memory. Irrelevant memories create a "propagating error feedback loop" where the model attends to noise.

Why the Built-in System Falls Short

The Sonnet selector matches on filenames and one-line descriptions only — no semantic search over content
Behavioral rules (e.g., "always verify claims before acting") have no keyword to trigger on — they apply to every task but nothing in the task description matches them
CLAUDE.md "Before X, read Y" triggers depend on Claude recognizing the task matches — probabilistic, not deterministic
3-5 memories is the sweet spot before attention degradation sets in

Approaches

Approach	Latency	Coverage	Complexity
Keyword / trigger matching CLAUDE.md: "Before X, read Y"	<5ms	~80% for <30 files	Low
Semantic similarity Embed memories, cosine similarity to task	50-500ms	~95%	Medium
Two-stage retrieval Fast vector search → LLM re-rank	200ms-2s	~98%	High
Hierarchical summaries Load index, LLM decides which to expand	1 round-trip	Depends on summary quality	Low
Dynamic expansion Start minimal, retrieve as needed	Ongoing	High (but delayed)	Medium

The Practical Sweet Spot

For most developers with <50 memory files, a layered approach covers 95%+ of cases:

Always-on rules in CLAUDE.md 0ms

Inline your 5-6 most universal behavioral rules directly. Always loaded, survives compaction.

Path-scoped rules 0ms, harness-enforced

.claude/rules/*.md with paths: frontmatter. Auto-triggered when Claude touches matching files. Most reliable mechanism — the harness enforces it, not the LLM.

SessionStart hook <5ms

Shell script that injects behavioral rules + dynamic context (git state, project detection) at every session start.

Sonnet memory selection built-in

The built-in selector picks up to 5 topic files per query. Write good one-line descriptions in MEMORY.md to help it.

MCP semantic search (optional) <100ms

For 50+ files: local embeddings via Ollama + ChromaDB/LanceDB. Claude calls search_memories() on demand.

Precision Over Recall

Optimize for precision A false negative (missing a memory) is recoverable — the model can ask for more context or you can point it to the file. A false positive (loading irrelevant memory) wastes tokens AND dilutes attention, and is not recoverable. This is why "load everything" fails.