Skip to main content

mnemonist

Active

An open ecosystem for tool-agnostic AI agent memory

Rust 2Updated Apr 20, 2026
agent-memoryai-memoryllmrustsemantic-searchshowcase

mnemonist

An open ecosystem for tool-agnostic AI agent memory.

Quick Start · Report Bug · Specification

crates.io   License

mnemonist demo

Contents

Features

  • Cognitive CLI — commands named after memory processes: memorize, remember, note, learn, consolidate, reflect, forget
  • Two-level memory — project (~/.mnemonist/{project}/) and global (~/.mnemonist/global/)
  • Working memory inbox — capacity-limited staging area (default 7 items) with attention scoring; items promoted to long-term memory via consolidate
  • Memory metadata — strength, access count, last accessed, source tracking; Hebbian reinforcement on retrieval
  • Plain markdown with YAML frontmatter — human-readable, git-friendly
  • Typed memories — user, feedback, project, reference
  • Local embeddingcandle crate with all-MiniLM-L6-v2 (384-dim, CPU/CUDA); no external server needed; model downloads from HuggingFace Hub on first use
  • Layered graph — three HNSW layers: code (.code-index.hnsw), project memory (.memory-index.hnsw), and global memory; inter-layer edges via refs frontmatter field
  • Pluggable code chunkingChunkingStrategy trait with built-in ParagraphChunking (blank-line boundaries) and FixedLineChunking (sliding window with overlap); no tree-sitter dependency
  • Cross-layer recallremember searches memory and code indices in parallel with blended relevance scoring (semantic + temporal); follows refs edges to surface referenced code chunks
  • Consolidationconsolidate promotes inbox items, decays stale memories, and re-embeds
  • Fuzzy forgetforget resolves partial and suffix matches so you don’t need the full filename
  • Embedding quality metricslearn reports anisotropy and similarity_range after indexing
  • TurboQuant — optional vector quantization (1-4 bit) for compact embedding storage
  • JSON-first — stdout for structured JSON, stderr for UX; pipe-friendly
  • Works with Claude Code, Codex, Gemini, Copilot, Cursor, or any AI tool

Install

curl -fsSL https://raw.githubusercontent.com/urmzd/mnemonist/main/install.sh | sh

Hardware acceleration

Pre-built binaries run on CPU with pure Rust matmuls — functional, but large batch operations like learn are slower than accelerated builds. If you have a Rust toolchain, you can build from source with hardware acceleration:

# macOS — Apple's Accelerate BLAS (~2x faster embedding throughput)
cargo install mnemonist-cli --features accelerate

# Linux/Windows with an NVIDIA GPU
cargo install mnemonist-cli --features cuda

Quick Start

# 1. Install
curl -fsSL https://raw.githubusercontent.com/urmzd/mnemonist/main/install.sh | sh
# Or, if you have a Rust toolchain: cargo install mnemonist-cli

# 2. Ingest the codebase — auto-creates ~/.mnemonist/{project}/ and embeds source files
mnemonist learn .

# 3. Memorize long-term knowledge
mnemonist memorize "prefer Rust for CLI tools" -t feedback
mnemonist memorize "deep Go expertise, new to React" -t user

# 4. Jot quick notes into the working memory inbox
mnemonist note "look into async runtime choices"
mnemonist note "check Linear project INGEST for pipeline bugs"

# 5. Consolidate — promote inbox to long-term memory, decay stale items, re-embed
mnemonist consolidate

# 6. Recall — semantic + text search across memories and code
mnemonist remember "rust async patterns"

# 7. Review everything
mnemonist reflect --all

# 8. Forget something you no longer need (fuzzy name matching)
mnemonist forget prefer-rust

Usage

Memory Levels

LevelLocationScope
Project~/.mnemonist/{project}/Per-repo corrections, decisions
Global~/.mnemonist/global/Cross-project preferences, expertise

Project memory takes precedence over global when they conflict.

Memory Types

TypeWhenExample
userExpertise, preferences”Deep Rust knowledge, new to React”
feedbackCorrections, validated approaches”Never mock the database in tests”
projectRepo-specific context (project-level only)“Auth rewrite driven by compliance”
referenceExternal resource pointers”Bugs tracked in Linear project INGEST”

CLI at a glance

mnemonist --help

CLI Commands

CommandDescription
mnemonist memorize "<point>" [-t type] [-n name]Deliberately encode a point into long-term memory (auto-embeds)
mnemonist note "<point>"Jot a quick note into working memory inbox
mnemonist remember "<ask>" [--budget N] [--level both]Recall memories by cue — searches memory and code indices in parallel with blended relevance scoring, follows refs
mnemonist learn [path] [--attend glob] [--capacity N]Ingest a codebase; chunks files with ParagraphChunking, embeds into .code-index.hnsw, reports quality metrics
mnemonist consolidate [--dry-run]Promote inbox items, decay stale memories, re-embed into .memory-index.hnsw
mnemonist reflect [--all] [--global]Introspect — review memories and inbox contents
mnemonist forget <file>Deliberately forget a memory (supports fuzzy/suffix name matching)
mnemonist config initCreate default config file
mnemonist config showShow current configuration
mnemonist config get <key>Get a config value (dot-notation)
mnemonist config set <key> <value>Set a config value
mnemonist config pathPrint config file path

All commands output JSON to stdout ({"ok": true, "data": {...}}).

Working Memory (Inbox)

The inbox is a capacity-limited staging area modeled after human working memory (default capacity: 7). Items enter via note (manual) or learn (code ingestion) and are scored by attention:

  • Items are sorted by attention score; lowest-scored items are evicted at capacity
  • consolidate promotes inbox items to long-term memory and clears the inbox
  • Stored in .inbox.json alongside memory files

Consolidation

mnemonist consolidate runs a sleep-like consolidation cycle:

  1. Promote — inbox items become long-term memories with type and strength
  2. Decay — memories not accessed within consolidation.decay_days (default 90) and below protected_access_count (default 5) are pruned
  3. Re-embed — all surviving memories are re-embedded for fresh semantic search

Use --dry-run to preview what would change.

Memory Metadata

Each memory file tracks cognitive metadata in its frontmatter:

FieldDescription
strengthConsolidation strength (increases on survival)
access_countRetrieval count (Hebbian reinforcement)
last_accessedISO 8601 timestamp of last retrieval
created_atWhen the memory was first created
sourceHow it was created: memorize, note, learn, consolidation
consolidated_fromOriginal files if created via merge
refsInter-layer edges — code chunk IDs or memory filenames this memory links to

Configuration

Layered config: ~/.mnemonist/mnemonist.toml (global default, created with mnemonist config init) + ./mnemonist.toml at the project root (per-project overrides; missing fields inherit).

[storage]
root = "~/.mnemonist"

[embedding]
provider = "candle"
model = "all-MiniLM-L6-v2"

[recall]
budget = 2000
priority = ["feedback", "project", "user", "reference"]
expand_refs = true
max_ref_expansions = 3

[index]
max_lines = 200

[code]
languages = ["rust", "python", "javascript", "go"]
max_chunk_lines = 100

[inbox]
capacity = 7

[consolidation]
decay_days = 90
merge_threshold = 0.85
protected_access_count = 5
max_memories = 200

[quantization]
enabled = false
bits = 2
algorithm = "mse"
temporal_weight = 0.2

Use mnemonist config set embedding.model all-MiniLM-L6-v2 to change values.

See the full Specification for details on file format, dynamic loading, precedence rules, and integration guides.

Benchmarks

Distance Functions

Function32-d128-d384-d
cosine_similarity12 ns59 ns207 ns
dot_product4 ns28 ns120 ns
l2_distance_squared5 ns30 ns125 ns
normalize18 ns82 ns239 ns

HNSW Index (500 vectors, dim=32)

OperationTime
Build (500 inserts)32.7 ms
Search top-115.2 µs
Search top-1015.2 µs
Search top-5015.2 µs
Save to disk91 µs
Load from disk85 µs

IVF-Flat Index (500 vectors, dim=32)

OperationTime
Train (k-means, 16 clusters)2.2 ms
Search top-111.9 µs
Search top-1012.0 µs
Search top-5012.1 µs
Save to disk66 µs
Load from disk57 µs

TurboQuant MSE (dim=128)

Bit-widthQuantizeDequantize
1-bit3.9 µs991 ns
2-bit3.9 µs988 ns
3-bit3.9 µs997 ns
4-bit4.1 µs998 ns

TurboQuant Prod (dim=128)

Bit-widthQuantizeDequantizeIP Estimate
2-bit116 µs141 µs111 µs
3-bit115 µs111 µs111 µs
4-bit115 µs111 µs112 µs

Bit Packing

Operation128x2b384x2b384x4b
Pack161 ns539 ns264 ns
Unpack90 ns270 ns241 ns

Embedding Store

Operation128d x 100384d x 100384d x 500
upsertTBDTBDTBD
getTBDTBDTBD
removeTBDTBDTBD
saveTBDTBDTBD
loadTBDTBDTBD

Inbox

Operationcap=7cap=50
push_to_capacityTBDTBD
push_with_evictionTBDTBD
saveTBDTBD
loadTBDTBD
drainTBDTBD

Memory Index

Operation10 entries100 entries
parse_lineTBD
to_lineTBD
searchTBDTBD
upsert_newTBDTBD
upsert_existingTBDTBD

Eval Functions

Function32d x 50128d x 50384d x 20
anisotropyTBDTBDTBD
similarity_rangeTBDTBDTBD
mean_centerTBDTBDTBD
discrimination_gapTBD

Measured on Apple Silicon (M-series) with cargo bench. Run just bench to reproduce. Raw results available in docs/benchmarks/.

Testing

just test                  # cargo test --workspace
bash scripts/validate.sh   # full E2E validation (requires release build)

See CONTRIBUTING.md for what each test suite covers and per-crate test counts.

Agent Skill

This repo’s conventions are available as portable agent skills in skills/, following the Agent Skills Specification.

Related standards: AGENTS.md · llms.txt

License

Apache-2.0