AI Memory Architectures

State of the art in agent memory: types, RAG for code, knowledge graphs, dedicated platforms, and the MemGPT paradigm.

Contents

Memory Types in AI Agents

The foundational survey "Memory in the Age of AI Agents" (December 2025) establishes the taxonomy along three axes: forms (how stored), functions (what it represents), and dynamics (how it changes).

Episodic Memory

Stores the sequence and details of specific interactions: "What happened when?" In coding assistants: conversation history, debug session logs, past code review discussions.

Semantic Memory

Accumulated knowledge base of facts and concepts: "What do I know?" Project architecture decisions, API contracts, team conventions, business domain knowledge.

Anthropic's approach: transparent Markdown files (CLAUDE.md) as semantic memory stores. Trades retrieval sophistication for auditability and user control.

Procedural Memory

Expertise developed through practice: "How do I do this?" Build/deploy workflows, debugging patterns, refactoring recipes, testing strategies.

The missing transformation The research paper Mem^p identifies that converting episodic to procedural memory (learning reusable patterns from specific experiences) is a critical capability most systems still lack. Current assistants store what happened but rarely abstract into reusable "how-to" knowledge automatically.

Working Memory

Integrates episodic, semantic, and procedural info with current input. In coding: the current file being edited, active task description, recently retrieved context, current conversation thread.

What agents need to preserve in working memory: goals, policy constraints, entity identifiers, and confirmed decisions — while continuously integrating new evidence.

Refined Taxonomy (2025-2026)

By StorageBy Function
Token-level: Direct representation in context windowFactual: Objective information (API docs, types)
Parametric: Encoded in model weights (fine-tuning)Experiential: Agent interaction history
Latent: Implicit in hidden statesWorking: Active reasoning context

RAG for Code

Vector Embeddings

Chunking Strategies for Code

StrategyHowQuality
AST-awareParse into AST, chunk at function/class boundariesBest for code
SemanticEmbedding similarity between adjacent segments+9% recall vs fixed-size
HierarchicalBoth fine-grained (function) and coarse (file/module)Flexible retrieval granularity
AgenticLLM decides chunk boundariesHighest quality, most expensive
Key insight Unlike prose, source code has explicit structure (functions, classes, modules) that should be respected. Splitting a function across chunks destroys the most important semantic unit.

Hybrid Search (Semantic + Keyword)

Pure vector search misses exact matches (function names, error codes). The industry has converged on hybrid:

How Major Tools Use RAG

Cursor

Sourcegraph Cody

Local vs Cloud RAG

ApproachPrivacyModelsSpeed
Cloud (Cursor, Cody, Copilot)Code leaves machineBetter modelsFast
Local (Ollama + ChromaDB/Qdrant)Full privacySmaller modelsDepends on GPU
For your setup RTX 3060 12GB VRAM: a local RAG pipeline with Ollama + ChromaDB/Qdrant is very feasible for personal codebase indexing. 70% of self-hosted LLM users cite data privacy as primary motivation.

Knowledge Graphs for Code

What They Represent

Nodes (functions, classes, modules, variables) with typed edges (calls, imports, inherits, implements, depends-on).

Available Tools

ToolFeatures
AxonIndexes codebases into structural knowledge graphs. Exposes via MCP tools.
GitNexusClient-side code intelligence, interactive searchable graphs. Runs in browser.
Neo4j Code KGNeo4j graph database for code relationships.
CogneeBuilds knowledge graphs from Python repos.
FalkorDB Code GraphSpecialized code graph solution.
3-5x improvement Graph RAG delivers 3-5x more relevant context than vector search alone by walking relationships after initial retrieval — returning complete call graphs, dependencies, and type signatures instead of isolated code fragments.

Graph + RAG Integration

LightRAG (EMNLP 2025): Leading open-source framework combining graphs with RAG. 6,000x token efficiency improvement over Microsoft's GraphRAG. Supports reranking for mixed queries.

Memgraph GraphRAG for Devs: Connects IDEs like Cursor/Windsurf directly to code knowledge graphs via MCP.

Memory Management Patterns

Memory Consolidation

Memory Decay and Relevance Scoring

Score based on: recency (when last accessed), frequency (how often retrieved), relevance (semantic connection to active context), importance (explicitly marked or heavily referenced).

Forgetting is a feature, not a bug: pruning low-value memories keeps retrieval fast and relevant.

Hierarchical Memory

LevelContentExample
0Raw conversation logs, full code diffsSession transcript
1Summarized session notes"Fixed auth bug via JWT validation update"
2Project-level knowledge"Auth uses JWT with RS256, tokens expire in 1h"
3Cross-project patterns"Always add token refresh alongside JWT validation"

Active Recall vs. Passive Context

ModeProsCons
Passive (always inject)Low latency, predictableConsumes context budget
Active (query-driven retrieval)ScalableAdds latency, can miss context

Emerging consensus: hybrid. Always-on for critical invariants + active recall for task-specific knowledge.

Dedicated Memory Platforms (2026)

SystemArchitectureLongMemEval ScoreKey Differentiator
Mem0Vector + Graph + KV hybrid26% over OpenAI baseline41K GitHub stars, AWS Agent SDK exclusive memory provider, 91% faster than full-context
Zep (Graphiti)Temporal knowledge graph63.8%Time as first-class dimension, tracks how facts change
Hindsight4-network parallel retrieval91.4%Strongest benchmarks: semantic + BM25 + graph + temporal in parallel
MemvidAppend-only .mv2 filesN/AZero infrastructure, single-file packaging, edge/offline
Letta (MemGPT)OS-inspired virtual contextN/ASelf-editing memory, paging metaphor

The Letta/MemGPT Paradigm

The most architecturally clean model for coding assistants: treat the context window as RAM, long-term storage as disk, and give the agent explicit paging control.

Emerging Approaches

MCP Memory Servers

The @modelcontextprotocol/server-memory provides simple JSONL-based persistent memory. The mcp-memory-service is more sophisticated with REST API, knowledge graph, and autonomous consolidation.

MCP's June 2025 OAuth 2.1 mandate signals enterprise readiness. MCP is now the standard integration fabric across Cursor, Windsurf, Claude Code, and others.

Agent Frameworks with Built-in Memory

FrameworkStatusMemory Approach
LangGraph1.0 (Oct 2025)Native MCP discovery, structured memory, decision control
CrewAIv1.10.1, 44K+ starsRole-based multi-agent, native MCP + A2A
MS Agent FrameworkAutoGen + Semantic Kernel mergedAzure AD, RBAC, SOC 2/HIPAA
LettaActive developmentMemGPT OS metaphor, self-editing memory

Anthropic's Direction

Google NotebookLM

Fundamentally different: source-grounding. AI constrained strictly to user-provided data. 1M token context window (8x increase in 2025), 6x longer conversation memory. Zero hallucination risk from external knowledge.

Key Takeaways

  1. The hybrid stack is converging: Vector search + knowledge graphs + keyword search, with reranking. Pure vector RAG is no longer state of the art.
  2. Memory routing remains the hardest problem: Knowing which memories matter is more important than how they're stored.
  3. Letta/MemGPT's OS metaphor is the cleanest model: Context window = RAM, long-term storage = disk, agent has paging control.
  4. Anthropic's transparent file-based approach trades sophistication for auditability — critical for coding where users need to trust and verify.
  5. Graph RAG delivers 3-5x better context than vector search alone for code (inherently relational).
  6. 2026: contextual memory becomes table stakes for production agent deployments.