AI Memory Architectures

State of the art in agent memory: types, RAG for code, knowledge graphs, dedicated platforms, and the MemGPT paradigm.

Memory Types in AI Agents
RAG for Code
Knowledge Graphs for Code
Memory Management Patterns
Dedicated Memory Platforms (2026)
Emerging Approaches
Key Takeaways

Memory Types in AI Agents

The foundational survey "Memory in the Age of AI Agents" (December 2025) establishes the taxonomy along three axes: forms (how stored), functions (what it represents), and dynamics (how it changes).

Episodic Memory

Stores the sequence and details of specific interactions: "What happened when?" In coding assistants: conversation history, debug session logs, past code review discussions.

Semantic Memory

Accumulated knowledge base of facts and concepts: "What do I know?" Project architecture decisions, API contracts, team conventions, business domain knowledge.

Anthropic's approach: transparent Markdown files (CLAUDE.md) as semantic memory stores. Trades retrieval sophistication for auditability and user control.

Procedural Memory

Expertise developed through practice: "How do I do this?" Build/deploy workflows, debugging patterns, refactoring recipes, testing strategies.

The missing transformation The research paper Mem^p identifies that converting episodic to procedural memory (learning reusable patterns from specific experiences) is a critical capability most systems still lack. Current assistants store what happened but rarely abstract into reusable "how-to" knowledge automatically.

Working Memory

Integrates episodic, semantic, and procedural info with current input. In coding: the current file being edited, active task description, recently retrieved context, current conversation thread.

What agents need to preserve in working memory: goals, policy constraints, entity identifiers, and confirmed decisions — while continuously integrating new evidence.

Refined Taxonomy (2025-2026)

By Storage	By Function
Token-level: Direct representation in context window	Factual: Objective information (API docs, types)
Parametric: Encoded in model weights (fine-tuning)	Experiential: Agent interaction history
Latent: Implicit in hidden states	Working: Active reasoning context

RAG for Code

Vector Embeddings

Voyage-3-large: outperforms OpenAI/Cohere by 9-20% on code retrieval, 32K-token context windows
Code-specific embedding models from Jina and Nomic understand syntax and structural relationships
Trend toward models embedding entire files/modules (not small chunks)

Chunking Strategies for Code

Strategy	How	Quality
AST-aware	Parse into AST, chunk at function/class boundaries	Best for code
Semantic	Embedding similarity between adjacent segments	+9% recall vs fixed-size
Hierarchical	Both fine-grained (function) and coarse (file/module)	Flexible retrieval granularity
Agentic	LLM decides chunk boundaries	Highest quality, most expensive

Key insight Unlike prose, source code has explicit structure (functions, classes, modules) that should be respected. Splitting a function across chunks destroys the most important semantic unit.

Hybrid Search (Semantic + Keyword)

Pure vector search misses exact matches (function names, error codes). The industry has converged on hybrid:

Dense + sparse: Vector similarity + BM25/TF-IDF, with learned weighting
HybridRAG: Integrates knowledge graphs with vector retrieval
Reranking: After initial retrieval, cross-encoder reranker scores and merges results. Now standard practice.

How Major Tools Use RAG

Cursor

Chunks at function/class boundaries locally
Embeddings in Turbopuffer (serverless vector DB)
Merkle trees for incremental re-indexing of changed files
Path obfuscation for privacy

Sourcegraph Cody

Three-layer system: local file, local repo, remote repo
Hybrid dense/sparse vector search
Supports up to 1M token context windows, 300K+ repos, 90GB+ monorepos
Two-stage pipeline: candidate retrieval then ranking

Local vs Cloud RAG

Approach	Privacy	Models	Speed
Cloud (Cursor, Cody, Copilot)	Code leaves machine	Better models	Fast
Local (Ollama + ChromaDB/Qdrant)	Full privacy	Smaller models	Depends on GPU

For your setup RTX 3060 12GB VRAM: a local RAG pipeline with Ollama + ChromaDB/Qdrant is very feasible for personal codebase indexing. 70% of self-hosted LLM users cite data privacy as primary motivation.

Knowledge Graphs for Code

What They Represent

Nodes (functions, classes, modules, variables) with typed edges (calls, imports, inherits, implements, depends-on).

Available Tools

Tool	Features
Axon	Indexes codebases into structural knowledge graphs. Exposes via MCP tools.
GitNexus	Client-side code intelligence, interactive searchable graphs. Runs in browser.
Neo4j Code KG	Neo4j graph database for code relationships.
Cognee	Builds knowledge graphs from Python repos.
FalkorDB Code Graph	Specialized code graph solution.

3-5x improvement Graph RAG delivers 3-5x more relevant context than vector search alone by walking relationships after initial retrieval — returning complete call graphs, dependencies, and type signatures instead of isolated code fragments.

Graph + RAG Integration

LightRAG (EMNLP 2025): Leading open-source framework combining graphs with RAG. 6,000x token efficiency improvement over Microsoft's GraphRAG. Supports reranking for mixed queries.

Memgraph GraphRAG for Devs: Connects IDEs like Cursor/Windsurf directly to code knowledge graphs via MCP.

Memory Management Patterns

Memory Consolidation

Merge-on-conflict: Update rather than duplicate when new info contradicts existing
Abstraction ladders: Convert episodes into rules ("deploy failed 3x due to env vars" → "always check env vars before deploy")
Mem0's two-phase pipeline: Extraction then Update, automatically merging related memories
TiMem's Temporal Memory Tree: Progressive abstraction from raw observations to high-level representations

Memory Decay and Relevance Scoring

Score based on: recency (when last accessed), frequency (how often retrieved), relevance (semantic connection to active context), importance (explicitly marked or heavily referenced).

Forgetting is a feature, not a bug: pruning low-value memories keeps retrieval fast and relevant.

Hierarchical Memory

Level	Content	Example
0	Raw conversation logs, full code diffs	Session transcript
1	Summarized session notes	"Fixed auth bug via JWT validation update"
2	Project-level knowledge	"Auth uses JWT with RS256, tokens expire in 1h"
3	Cross-project patterns	"Always add token refresh alongside JWT validation"

Active Recall vs. Passive Context

Mode	Pros	Cons
Passive (always inject)	Low latency, predictable	Consumes context budget
Active (query-driven retrieval)	Scalable	Adds latency, can miss context

Emerging consensus: hybrid. Always-on for critical invariants + active recall for task-specific knowledge.

Dedicated Memory Platforms (2026)

System	Architecture	LongMemEval Score	Key Differentiator
Mem0	Vector + Graph + KV hybrid	26% over OpenAI baseline	41K GitHub stars, AWS Agent SDK exclusive memory provider, 91% faster than full-context
Zep (Graphiti)	Temporal knowledge graph	63.8%	Time as first-class dimension, tracks how facts change
Hindsight	4-network parallel retrieval	91.4%	Strongest benchmarks: semantic + BM25 + graph + temporal in parallel
Memvid	Append-only .mv2 files	N/A	Zero infrastructure, single-file packaging, edge/offline
Letta (MemGPT)	OS-inspired virtual context	N/A	Self-editing memory, paging metaphor

The Letta/MemGPT Paradigm

The most architecturally clean model for coding assistants: treat the context window as RAM, long-term storage as disk, and give the agent explicit paging control.

Two-tier memory: main context (in-window) + external context (out-of-window storage)
Agents autonomously decide what to page in/out, analogous to OS virtual memory
Self-editing: agents can modify their own memory through tool use

Emerging Approaches

MCP Memory Servers

The @modelcontextprotocol/server-memory provides simple JSONL-based persistent memory. The mcp-memory-service is more sophisticated with REST API, knowledge graph, and autonomous consolidation.

MCP's June 2025 OAuth 2.1 mandate signals enterprise readiness. MCP is now the standard integration fabric across Cursor, Windsurf, Claude Code, and others.

Agent Frameworks with Built-in Memory

Framework	Status	Memory Approach
LangGraph	1.0 (Oct 2025)	Native MCP discovery, structured memory, decision control
CrewAI	v1.10.1, 44K+ stars	Role-based multi-agent, native MCP + A2A
MS Agent Framework	AutoGen + Semantic Kernel merged	Azure AD, RBAC, SOC 2/HIPAA
Letta	Active development	MemGPT OS metaphor, self-editing memory

Anthropic's Direction

Claude Memory rolled out Sep-Oct 2025 (Team/Enterprise, then all paid users)
Architecture: transparent Markdown files, not vector databases
Claude Code Auto-dream (early 2026): Background sub-agent for autonomous memory consolidation
Memory Tool API: lets developers build agents with persistent cross-session memory

Google NotebookLM

Fundamentally different: source-grounding. AI constrained strictly to user-provided data. 1M token context window (8x increase in 2025), 6x longer conversation memory. Zero hallucination risk from external knowledge.

Key Takeaways

The hybrid stack is converging: Vector search + knowledge graphs + keyword search, with reranking. Pure vector RAG is no longer state of the art.
Memory routing remains the hardest problem: Knowing which memories matter is more important than how they're stored.
Letta/MemGPT's OS metaphor is the cleanest model: Context window = RAM, long-term storage = disk, agent has paging control.
Anthropic's transparent file-based approach trades sophistication for auditability — critical for coding where users need to trust and verify.
Graph RAG delivers 3-5x better context than vector search alone for code (inherently relational).
2026: contextual memory becomes table stakes for production agent deployments.

AI Memory Architectures

Contents

Memory Types in AI Agents

Episodic Memory

Semantic Memory

Procedural Memory

Working Memory

Refined Taxonomy (2025-2026)

RAG for Code

Vector Embeddings

Chunking Strategies for Code

Hybrid Search (Semantic + Keyword)

How Major Tools Use RAG

Cursor

Sourcegraph Cody

Local vs Cloud RAG

Knowledge Graphs for Code

What They Represent

Available Tools

Graph + RAG Integration

Memory Management Patterns

Memory Consolidation

Memory Decay and Relevance Scoring

Hierarchical Memory

Active Recall vs. Passive Context

Dedicated Memory Platforms (2026)

The Letta/MemGPT Paradigm

Emerging Approaches

MCP Memory Servers

Agent Frameworks with Built-in Memory

Anthropic's Direction

Google NotebookLM

Key Takeaways