Automatically loading the right context at the right time, without user intervention.
You have N memory files covering different projects and topics. A new session starts. The harness needs to decide which memories are relevant without you telling it. Current approaches just dump everything into context or rely on you to say "read X.md".
| Approach | Latency | Coverage | Complexity |
|---|---|---|---|
| Keyword / trigger matching CLAUDE.md: "Before X, read Y" | <5ms | ~80% for <30 files | Low |
| Semantic similarity Embed memories, cosine similarity to task | 50-500ms | ~95% | Medium |
| Two-stage retrieval Fast vector search → LLM re-rank | 200ms-2s | ~98% | High |
| Hierarchical summaries Load index, LLM decides which to expand | 1 round-trip | Depends on summary quality | Low |
| Dynamic expansion Start minimal, retrieve as needed | Ongoing | High (but delayed) | Medium |
For most developers with <50 memory files, a layered approach covers 95%+ of cases:
Inline your 5-6 most universal behavioral rules directly. Always loaded, survives compaction.
.claude/rules/*.md with paths: frontmatter. Auto-triggered when Claude touches matching files. Most reliable mechanism — the harness enforces it, not the LLM.
Shell script that injects behavioral rules + dynamic context (git state, project detection) at every session start.
The built-in selector picks up to 5 topic files per query. Write good one-line descriptions in MEMORY.md to help it.
For 50+ files: local embeddings via Ollama + ChromaDB/LanceDB. Claude calls search_memories() on demand.