Features SDKs Architecture Use Cases Compare FAQ GitHub
AI Agent Memory Infrastructure

The memory engine
for AI agents

Give your agents persistent, searchable memory that survives across sessions. They remember users, learn from mistakes, and get smarter over time. Built in Rust. Deploys in seconds.

Sub-10ms queries 45 MCP tools 4 native SDKs
agent memory
# Store agent memory
POST /v1/memory/store
{
  "agent_id": "assistant-1",
  "text": "User prefers TypeScript",
  "memory_type": "semantic",
  "importance": 0.9
}

# Recall by meaning
POST /v1/memory/recall
{
  "query": "language preferences",
  "top_k": 5
}

# → Result
{ "score": 0.97, "text": "User prefers TypeScript" }
0
p99 query latency
0
MCP tools built-in
0
Native SDKs
0
External dependencies
Works with
The problem

Your agents forget
everything they learn

Every session starts from zero. Thousands of interactions, zero retained knowledge. You're paying to re-teach your agents the same things over and over.

Sessions are isolated silos
Each conversation starts blank. Your agent can't recall what it learned yesterday, last week, or across 10,000 prior interactions.
Knowledge evaporates at scale
Insights from thousands of users vanish after each session. Your agent never compounds intelligence — it stays perpetually naive.
Context stuffing is a dead end
Cramming history into prompts burns tokens, inflates costs, and hits a hard ceiling. It's duct tape, not architecture.
agent session
agent.recall("user preferences")
Error: No memory found.
Context window empty.
agent.sessions
1,847 sessions completed
0 memories persisted
agent.monthly_cost
$4,200/mo on context stuffing
0 knowledge retained
0%
Retention
$50k
Wasted / year
Capabilities

Everything agents need
to remember

Six core capabilities that turn stateless AI into agents with genuine, compounding memory.

Vector + Hybrid Search
Find memories by meaning, not just keywords. Cosine, HNSW, and BM25 full-text combined with tunable hybrid weights.
Persistent Agent Memory
Store, recall, consolidate, and forget. Four memory types — episodic, semantic, procedural, working — with automatic importance decay.
Built-in Embeddings
Text is auto-embedded on store and query. No OpenAI calls, no external APIs. HuggingFace models ship inside the binary.
MCP Native (45 Tools)
Drop into Claude, Cursor, or Windsurf instantly. Memory, search, and knowledge graph exposed as 45 callable MCP tools.
Knowledge Graph
Automatically connects related memories into a queryable graph. Entity extraction, similarity edges, cluster summaries, and semantic deduplication.
Dashboard + CLI
Visual admin dashboard for exploring memories, running queries, and monitoring agents. Plus a full dk CLI for automation.
SDKs

Integrate in minutes

Native SDKs for Python, TypeScript, Go, and Rust. Plus REST and gRPC for everything else. Five lines to first memory.

Store & Recall
Semantic memory with automatic embedding and importance scoring
Session Lifecycle
Context persists across every conversation automatically
Multi-Agent
Isolated namespaces for hundreds of agents at once
MCP Ready
45 tools for Claude, Cursor, and Windsurf out of the box
from dakera import DakeraClient

client = DakeraClient("http://localhost:3000")

# Store agent memory
client.memory_store(
    agent_id="assistant-1",
    text="User prefers TypeScript",
    memory_type="semantic",
    importance=0.9
)

# Recall by meaning
memories = client.memory_recall(
    agent_id="assistant-1",
    query="language preferences",
    top_k=5
)
import { DakeraClient } from "dakera"

const client = new DakeraClient("http://localhost:3000")

await client.memoryStore({
  agentId: "assistant-1",
  text: "User prefers TypeScript",
  memoryType: "semantic",
  importance: 0.9
})

const memories = await client.memoryRecall({
  agentId: "assistant-1",
  query: "language preferences",
  topK: 5
})
import "github.com/dakera/dakera"

client := dakera.NewClient("http://localhost:3000")

client.MemoryStore(ctx, &dakera.Memory{
    AgentID:    "assistant-1",
    Text:       "User prefers TypeScript",
    MemoryType: "semantic",
    Importance: 0.9,
})

memories, _ := client.MemoryRecall(ctx, "assistant-1", "language preferences", 5)
use dakera_client::DakeraClient;

let client = DakeraClient::new("http://localhost:3000");

// Store agent memory
client.memory_store(&Memory {
    agent_id: "assistant-1".into(),
    text: "User prefers TypeScript".into(),
    memory_type: "semantic".into(),
    importance: 0.9,
    ..Default::default()
}).await?;

// Recall by meaning
let memories = client
    .memory_recall("assistant-1", "language preferences", 5)
    .await?;
# Store memory
curl -X POST localhost:3000/v1/memory/store \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"assistant-1","text":"User prefers TS","importance":0.9}'

# Recall
curl -X POST localhost:3000/v1/memory/recall \
  -d '{"agent_id":"assistant-1","query":"language preferences","top_k":5}'

# Text search with auto-embedding
curl -X POST localhost:3000/v1/namespaces/docs/query-text \
  -d '{"text":"semantic search systems","top_k":5}'
Architecture

Five Rust crates. One binary.

6 index algorithms, 3 storage tiers, built-in ML inference, and a production-grade API layer — compiled into a single deployable artifact. 118µs queries. 27M inserts per second.

0
Rust crates
0
Index algorithms
0
p50 query latency
0
Peak throughput
0
Storage tiers
01
dakera-api
6 components
Production-grade REST & gRPC API layer with authentication, observability, and rate control
AxumTonicAuthPrometheusOpenTelemetry
REST API
Full CRUD with batch upsert, multi-namespace support, and streaming responses. Axum-based with tower middleware.
JSON + CBOR
gRPC
High-performance binary protocol via Tonic. Bi-directional streaming for real-time indexing and search operations.
Protobuf v3
Auth & API Keys
Multi-tenant token authentication with per-key permissions, namespace isolation, and configurable RBAC policies.
Rate Limiting
Per-key sliding window rate limiting with burst allowance. Configurable per endpoint, per namespace, or globally.
Audit Logging
Structured JSON operation logs with request tracing, latency breakdown, and compliance-ready event stream.
Prometheus + OTel
Built-in /metrics endpoint with histogram latencies, request counts, and distributed tracing via OpenTelemetry SDK.
Pull + Push
02
dakera-engine
12 components
Six index algorithms, hybrid search, auto-index selection, and distributed clustering with Raft consensus
HNSWIVFSPFreshBM25HybridRaft
HNSW
Hierarchical navigable small-world graph for sub-millisecond approximate nearest neighbor queries at scale.
8.5K qps @ 99% recall
IVF
Inverted file index with configurable nprobe for high-throughput batch indexing with tunable recall trade-offs.
877K vectors/s insert
SPFresh
Real-time streaming index optimized for continuous ingestion. LSMT-inspired design with background compaction.
27.4M inserts/s peak
PQ + SQ
Product quantization (4-16 sub-vectors) and scalar quantization for 8-32x memory compression with minimal recall loss.
8-32x compression
BM25
Full-text keyword search with configurable k1/b parameters, stemming, stop words, and multi-language tokenization.
Hybrid Search
Reciprocal Rank Fusion (RRF) combining vector similarity and keyword relevance into a single ranked result set.
RRF fusion
Auto-Index
Analyzes dataset characteristics (cardinality, dimensionality, distribution) and selects the optimal index strategy.
Agent Memory
Importance-weighted memory with consolidation, decay scoring, and semantic deduplication for AI agent workflows.
Knowledge Graph
Entity-relationship graph with typed edges, traversal queries, and automatic relationship extraction from text.
Gossip Protocol
SWIM-based protocol for cluster membership, failure detection, and metadata propagation across nodes.
Protocol: SWIM
Leader Election
Raft-based consensus for partition leader assignment, log replication, and automatic failover with quorum writes.
Raft consensus
Sharding
Consistent hashing with virtual nodes for automatic data distribution and rebalancing across cluster members.
03
dakera-inference
6 components
Rust-native ML embedding pipeline with Candle runtime — no Python, no ONNX, no external dependencies
CandleMiniLMBGEE5MetalCUDA
Candle Runtime
Pure Rust ML inference engine by Hugging Face. Zero-copy tensor ops, no Python runtime needed, WASM-compatible.
Pure Rust
MiniLM-L6
384-dim embeddings optimized for speed. Ideal for real-time agent memory with low-latency requirements.
384 dims · 22M params
BGE-Small
BAAI General Embedding for high-accuracy semantic search. State-of-the-art retrieval quality in a compact model.
384 dims · 33M params
E5-Small
Microsoft's E5 model with instruction-tuned embeddings. Excellent for query-document asymmetric search patterns.
384 dims · 33M params
Batch Processing
Dynamic batching with configurable batch size and timeout. Amortizes model overhead for bulk ingestion workloads.
Up to 64 per batch
CPU / CUDA / Metal
Automatic hardware detection with Metal on macOS, CUDA on Linux/Windows, and optimized AVX2/NEON CPU fallback.
Auto-detect
04
dakera-storage
9 components
Three-tier persistence engine — hot memory, warm filesystem, cold S3 — with WAL durability and background compaction
MemoryFilesystemS3WALSnapshotsCompaction
Memory Tier
Lock-free concurrent hashmap with arena allocation. Sub-microsecond reads for hot data and active agent sessions.
Sub-µs reads
Filesystem Tier
Memory-mapped file storage with LSM-tree compaction. Handles datasets larger than RAM with predictable tail latency.
mmap + LSM
S3 / MinIO
Cloud object storage backend for cold data archival. Automatic tiering moves data down based on access frequency.
Auto-tier
Write-Ahead Log
Append-only WAL with fsync durability guarantees. Crash recovery replays log to reconstruct consistent state.
fsync durable
Snapshots
Point-in-time consistent snapshots with copy-on-write semantics. Export to local disk or stream directly to S3.
Compaction
Background merge of sorted runs with configurable size ratios. Reclaims space from tombstones and overwrites.
Delta Encoding
Stores only vector deltas for versioned data. Reduces storage by 40-70% for frequently updated embeddings.
40-70% savings
TTL
Per-record and per-namespace time-to-live with lazy expiration. Background sweeper reclaims expired entries.
Encryption at Rest
AES-256-GCM encryption for filesystem and S3 tiers. Key rotation support with zero-downtime re-encryption.
AES-256-GCM
05
dakera-common
6 components
Shared type system, error taxonomy, configuration, and cross-crate utilities used by all other crates
TypesErrorsConfigSerdeValidation
Shared Types
Strongly-typed domain models for vectors, memories, namespaces, and search results. Zero-cost serde serialization.
Error Taxonomy
Hierarchical error types with context propagation, HTTP status mapping, and structured error responses for clients.
Configuration
Layered config from defaults → TOML → env vars → CLI flags. Hot reload for runtime-tunable parameters.
Hot reload
Validation
Input validation with dimension checks, UTF-8 enforcement, payload size limits, and custom constraint rules.
Serialization
Zero-copy deserialization with serde. Supports JSON, CBOR, MessagePack, and custom binary format for vectors.
Zero-copy
Telemetry
Shared tracing subscriber with span propagation, structured logging (JSON + pretty), and metric type definitions.
Ecosystem
MCP Server · 45 tools CLI · dk Dashboard · Leptos Python SDK TypeScript SDK Go SDK Rust SDK
How it works

Three steps to persistent intelligence

From raw conversation to compounding knowledge — your agent's memory grows with every interaction.

01
Store
Your agent stores conversations, decisions, and preferences as embedded memories — each with an importance score and type label. Embeddings happen automatically inside the binary.
memory.store("User prefers TypeScript", importance=0.9)
Auto-embeddingImportance scoring4 memory types
02
Recall
Before each response, the agent retrieves the most relevant memories — combining vector similarity, keyword matching, and graph traversal into a single ranked result.
memory.recall("language preferences", top_k=5)
Hybrid search<10ms p99Graph traversal
03
Learn
Over time, overlapping memories merge automatically. Importance decays, facts deduplicate, and related concepts connect. Your agent builds compounding intelligence — not a growing pile of text.
memory.consolidate("agent-1", strategy="merge")
Auto-consolidationImportance decayDeduplication
Who it's for

Built for teams that ship intelligent agents

From solo developers to platform teams — Dakera powers the memory layer for agents that need to remember.

AI Agent Builders
Give your agents long-term memory that compounds over time. Store conversations, recall by meaning, and let overlapping knowledge consolidate automatically — across millions of interactions.
LangChain CrewAI AutoGen
IDE AI Users
Your coding assistant remembers codebase patterns, architecture decisions, and team preferences across every session. Add the MCP server to your IDE config — no code changes, instant persistent memory.
MCP Drop-in IDE native
Platform Teams
Serve hundreds of agents from one instance. Namespace isolation, key-based auth, rate limiting, and Prometheus metrics come built-in. Scale horizontally with Raft consensus when you need it.
Multi-tenant Auth Clustering
RAG Pipeline Developers
Replace your entire vector database stack with one binary. Six index algorithms and hybrid search with embeddings included — no API keys to manage, no Python runtime, no moving parts.
Hybrid search Auto-index Self-contained
Compare

Dakera vs. the memory landscape

Compared against dedicated AI memory tools — not just vector databases. Dakera is the only single-binary, Rust-native memory engine with built-in embeddings and zero external dependencies.

Dakera
dakera
23 MB · 1 binary
Mem0
Python API
PostgreSQL
Neo4j
OpenAI API
Docker
~2 GB · 5 services
Zep
Python
Neo4j
OpenAI API
Docker
~1.5 GB · 4 services
Letta
Python
PostgreSQL
LLM Provider
Docker
~1.8 GB · 4 services
Hindsight
Python / Go
PostgreSQL
Embedding API
~1 GB · 3 services
← Scroll to compare →
Capability Dakera Mem0 Zep / Graphiti Letta Hindsight
Runtime Rust, single binaryPython + 3 containersPython + Neo4jPython + PostgresPython + Postgres
Built-in embeddings Candle — no external APIRequires OpenAI / OllamaRequires OpenAIRequires external LLMRequires external API
Index algorithms 6 built-in (HNSW, IVF, BM25...)External vector DB requiredExternal vector DB requiredExternal vector DB requiredpgvector only
MCP server 45 tools, native~10 tools~4 tools (experimental)Consumer only MCP-first
Knowledge graph Built-in, auto-extractionPro tier only ($249/mo) Temporal graph (core)Entity networks
Tiered storage Memory → FS → S3
Distributed clustering Raft + sharding
External dependencies ZeroPostgres + Neo4j + embedding APINeo4j + OpenAIPostgres + LLM providerPostgres + embedding API
Ready to start

Stop re-teaching your agents.
Give them memory.

Deploy in seconds. Your agents start remembering immediately. No infrastructure to manage, no vendor lock-in.

Production readyHorizontal scalingEncryption at restMulti-tenant
FAQ

Common
questions

Everything you need to know about Dakera. Can't find what you're looking for? Reach out on GitHub.

Ask on GitHub
Is Dakera a vector database?
It includes a vector database (HNSW, IVF, PQ, BM25, hybrid search) but goes much further — adding agent memory primitives, built-in embeddings, importance decay, knowledge graphs, and MCP integration. Think of it as a vector database that understands how agents think.
Do I need an OpenAI API key for embeddings?
No. Text is embedded automatically on store and query using built-in models (MiniLM, BGE, E5) powered by the Candle runtime. No external calls, no additional cost.
Is Dakera production-ready?
Yes. WAL durability, snapshots, AES-256-GCM encryption, multi-tenant auth, rate limiting, Prometheus, and OpenTelemetry are all included. Designed for production from day one.
Can I use it with Claude, Cursor, or Windsurf?
Yes. Dakera ships as a native MCP server with 45 tools. Add it to your Claude Desktop config, Cursor settings, or Windsurf configuration. Your AI assistant gets persistent memory across all sessions — zero code changes required.
How does Dakera compare to Mem0 or Zep?
Mem0 requires Python + Postgres + Neo4j + an embedding API. Zep needs Neo4j + OpenAI. Dakera compiles everything — embeddings, indexing, graph, storage — into one binary. No Docker compose, no API keys, no assembly required.
How does Dakera handle scaling?
A single Dakera instance handles millions of vectors comfortably. For horizontal scaling, Dakera supports distributed clustering with Raft consensus, consistent-hash sharding, and automatic rebalancing. Add nodes — the data redistributes automatically.
What languages and SDKs are supported?
Native SDKs for Python, TypeScript, Go, and Rust. Plus a REST API (JSON) and gRPC (Protobuf) for any other language. MCP protocol for AI tool integration. Five lines of code to store your first memory.