State of the art in agent memory: types, RAG for code, knowledge graphs, dedicated platforms, and the MemGPT paradigm.
The foundational survey "Memory in the Age of AI Agents" (December 2025) establishes the taxonomy along three axes: forms (how stored), functions (what it represents), and dynamics (how it changes).
Stores the sequence and details of specific interactions: "What happened when?" In coding assistants: conversation history, debug session logs, past code review discussions.
Accumulated knowledge base of facts and concepts: "What do I know?" Project architecture decisions, API contracts, team conventions, business domain knowledge.
Anthropic's approach: transparent Markdown files (CLAUDE.md) as semantic memory stores. Trades retrieval sophistication for auditability and user control.
Expertise developed through practice: "How do I do this?" Build/deploy workflows, debugging patterns, refactoring recipes, testing strategies.
Integrates episodic, semantic, and procedural info with current input. In coding: the current file being edited, active task description, recently retrieved context, current conversation thread.
What agents need to preserve in working memory: goals, policy constraints, entity identifiers, and confirmed decisions — while continuously integrating new evidence.
| By Storage | By Function |
|---|---|
| Token-level: Direct representation in context window | Factual: Objective information (API docs, types) |
| Parametric: Encoded in model weights (fine-tuning) | Experiential: Agent interaction history |
| Latent: Implicit in hidden states | Working: Active reasoning context |
| Strategy | How | Quality |
|---|---|---|
| AST-aware | Parse into AST, chunk at function/class boundaries | Best for code |
| Semantic | Embedding similarity between adjacent segments | +9% recall vs fixed-size |
| Hierarchical | Both fine-grained (function) and coarse (file/module) | Flexible retrieval granularity |
| Agentic | LLM decides chunk boundaries | Highest quality, most expensive |
Pure vector search misses exact matches (function names, error codes). The industry has converged on hybrid:
| Approach | Privacy | Models | Speed |
|---|---|---|---|
| Cloud (Cursor, Cody, Copilot) | Code leaves machine | Better models | Fast |
| Local (Ollama + ChromaDB/Qdrant) | Full privacy | Smaller models | Depends on GPU |
Nodes (functions, classes, modules, variables) with typed edges (calls, imports, inherits, implements, depends-on).
| Tool | Features |
|---|---|
| Axon | Indexes codebases into structural knowledge graphs. Exposes via MCP tools. |
| GitNexus | Client-side code intelligence, interactive searchable graphs. Runs in browser. |
| Neo4j Code KG | Neo4j graph database for code relationships. |
| Cognee | Builds knowledge graphs from Python repos. |
| FalkorDB Code Graph | Specialized code graph solution. |
LightRAG (EMNLP 2025): Leading open-source framework combining graphs with RAG. 6,000x token efficiency improvement over Microsoft's GraphRAG. Supports reranking for mixed queries.
Memgraph GraphRAG for Devs: Connects IDEs like Cursor/Windsurf directly to code knowledge graphs via MCP.
Score based on: recency (when last accessed), frequency (how often retrieved), relevance (semantic connection to active context), importance (explicitly marked or heavily referenced).
Forgetting is a feature, not a bug: pruning low-value memories keeps retrieval fast and relevant.
| Level | Content | Example |
|---|---|---|
| 0 | Raw conversation logs, full code diffs | Session transcript |
| 1 | Summarized session notes | "Fixed auth bug via JWT validation update" |
| 2 | Project-level knowledge | "Auth uses JWT with RS256, tokens expire in 1h" |
| 3 | Cross-project patterns | "Always add token refresh alongside JWT validation" |
| Mode | Pros | Cons |
|---|---|---|
| Passive (always inject) | Low latency, predictable | Consumes context budget |
| Active (query-driven retrieval) | Scalable | Adds latency, can miss context |
Emerging consensus: hybrid. Always-on for critical invariants + active recall for task-specific knowledge.
| System | Architecture | LongMemEval Score | Key Differentiator |
|---|---|---|---|
| Mem0 | Vector + Graph + KV hybrid | 26% over OpenAI baseline | 41K GitHub stars, AWS Agent SDK exclusive memory provider, 91% faster than full-context |
| Zep (Graphiti) | Temporal knowledge graph | 63.8% | Time as first-class dimension, tracks how facts change |
| Hindsight | 4-network parallel retrieval | 91.4% | Strongest benchmarks: semantic + BM25 + graph + temporal in parallel |
| Memvid | Append-only .mv2 files | N/A | Zero infrastructure, single-file packaging, edge/offline |
| Letta (MemGPT) | OS-inspired virtual context | N/A | Self-editing memory, paging metaphor |
The most architecturally clean model for coding assistants: treat the context window as RAM, long-term storage as disk, and give the agent explicit paging control.
The @modelcontextprotocol/server-memory provides simple JSONL-based persistent memory. The mcp-memory-service is more sophisticated with REST API, knowledge graph, and autonomous consolidation.
MCP's June 2025 OAuth 2.1 mandate signals enterprise readiness. MCP is now the standard integration fabric across Cursor, Windsurf, Claude Code, and others.
| Framework | Status | Memory Approach |
|---|---|---|
| LangGraph | 1.0 (Oct 2025) | Native MCP discovery, structured memory, decision control |
| CrewAI | v1.10.1, 44K+ stars | Role-based multi-agent, native MCP + A2A |
| MS Agent Framework | AutoGen + Semantic Kernel merged | Azure AD, RBAC, SOC 2/HIPAA |
| Letta | Active development | MemGPT OS metaphor, self-editing memory |
Fundamentally different: source-grounding. AI constrained strictly to user-provided data. 1M token context window (8x increase in 2025), 6x longer conversation memory. Zero hallucination risk from external knowledge.