Pinky Brain — Architecture plan (local)
Knowledge management system for agents + humans, scalable to several projects over 2+ years. Evolution of quack-brain. Decision made: LOCAL mode. Embedded SQLite, no server, no network. No storage abstraction (a single real backend → don't abstract it, YAGNI).
1. Vision and goals
- Readable and versioned source of truth: knowledge lives in markdown + git. The agent edits it with its native tools and a human reviews it with
git diff. - Real retrieval: hybrid search (full-text + semantic), not manual
grep. - Multi-project: project knowledge + global knowledge across projects.
- Durable 2+ years: the index is 100% regenerable from the markdown; no critical data lives only in the DB.
- Zero ops: one binary + one
brain.dbfile. No Docker, no daemon, offline.
2. Design principles
- Markdown = source of truth. SQLite = derived, disposable index. If the DB gets corrupted →
pinky reindexrebuilds it from the.mdfiles. - SOLID without over-engineering. A concrete
storemodule over SQLite, **with no abstraction trait**: there is no second backend, so nothing gets abstracted. If one ever appeared (unlikely), the trait gets extracted then — not before. - Semantic pull over blind push. The agent searches when it needs to (MCP tool), instead of injecting all the context on every hook.
- Offline-first. Works without network. Local embeddings. The network only comes into the
git pull/pushof the global brain, and it's optional. - Append-only where possible (diary) so git merges without conflicts.
3. Architecture
┌─ Fuente de verdad (git) ─────────────────────────────┐
│ <proyecto>/documentation/*.md │
│ ~/.pinky/brain/*.md (global cross-proyecto)│
└───────────────────────┬──────────────────────────────┘
│ notify (file watcher)
▼
┌─ pinky-core (lib Rust) ──────────────────────────────┐
│ parse(frontmatter) → chunk → embed → upsert │
│ search híbrido (BM25 + vector, fusión RRF) → rerank │
│ módulo `store`: SQLite (sqlite-vec + FTS5) │
└───────────────────────┬──────────────────────────────┘
┌───────────────┼────────────────┐
▼ ▼ ▼
pinky (CLI) pinky-mcp (server) pinky-hooks
reindex/search brain_search tool SessionStart/Stop
pinky-core: library with all the logic (parse, chunk, embed, search, store).store: concrete module overrusqlite+sqlite-vec+ FTS5. No trait.pinky-mcp: exposesbrain_search,brain_save,brain_statsas an MCP server. The agent does semantic pull instead of grep.pinkyCLI:reindex,search,doctor,stats,gc.pinky-hooks: the 4 hooks from quack-brain (SessionStart/PreRead/PreWrite/Stop), thin: the heavy retrieval is delegated to the core/MCP instead of injecting everything.
4. Data model (SQLite)
The YAML frontmatter is mapped to indexed metadata for filtering:
entry(
id TEXT PK, -- hash estable del path
path TEXT, -- ruta del .md (relativa al root)
scope TEXT, -- 'project:<name>' | 'global'
type TEXT, -- gotcha | pattern | bug | decision | diary | guide
project TEXT,
tags TEXT, -- JSON array
created TEXT, -- ISO date
last_verified TEXT, -- para staleness/decay (§9)
title TEXT,
body TEXT,
content_hash TEXT -- re-index incremental (skip si no cambió)
)
chunk_fts FTS5 virtual table (text) -- BM25
chunk_vec sqlite-vec virtual table (embedding float[384])
chunk(
id, entry_id, ord, text -- une fts + vec con metadata
)
backlink( -- grafo código ↔ conocimiento desde `// Brain: {slug}`
entry_id, file_path, line, repo
)
usage( -- telemetría: qué entradas se recuperan/usan de verdad
entry_id, retrieved_at, query, was_useful
)
5. Technology stack
| Need | Choice | Why |
|---|---|---|
| Language | Rust | <50ms latency in hooks, one binary, no runtime |
| Index | rusqlite + sqlite-vec + FTS5 | One file, hybrid in a single query, in-process |
| Embeddings | fastembed (ONNX, multilingual-e5-small) | Local, no API, no cost per hook, ES+IT |
| Rerank (optional) | cross-encoder ONNX (bge-reranker-base) | Precision after the hybrid retrieval |
| Frontmatter | gray_matter | YAML + body in one step |
| File watching | notify | Incremental re-index |
| MCP | rmcp (SDK oficial Rust) | brain_search as a tool |
| Async runtime | tokio | MCP server + watcher |
Multilingual embeddings: multilingual-e5-small (384 dims) covers Spanish and Italian (SGSVP courses) with no cost per query. If more recall is needed down the road, bge-m3. The model is versioned in the chunk metadata → global reindex when changing it.
6. Hybrid retrieval
Not vector alone. BM25 (FTS5) + vector (sqlite-vec), fused with Reciprocal Rank Fusion, then optional rerank:
- BM25: exact terms (slugs, function names, error strings).
- Vector: semantic similarity ("something similar to this problem").
- RRF: fuses both rankings without calibrating weights.
- Rerank (cross-encoder) over the top-N for final precision.
- Metadata filters: by
scope,project,type,tags, freshness.
The hybrid stack (FTS5 + sqlite-vec + embeddings) is in place from Phase 0, with no intermediate text-only stages.
7. Sync and multi-project
- Project knowledge: lives in the project's repo (
documentation/), versioned with the code. - Global knowledge: dedicated git repo cloned into
~/.pinky/brain. Sync between your machines =git pull/push(this is NOT "shared mode": it's still local, git is just the transport). Append-only in diary → conflict-free merges. - The
brain.dbindex is never committed; it's rebuilt on each machine.
8. Improvements I'd like to make (over quack-brain)
- Staleness decay: penalize in the ranking entries with old
last_verified; re-verification reminders. Aging knowledge degrades on its own. - Code ↔ knowledge graph: index the
// Brain: {slug}breadcrumbs as backlinks. "What code depends on this gotcha?" and vice versa. - Semantic dedup: on save, detect near-duplicate entries (cosine > threshold) and propose a merge. Keeps the brain from filling up with repeated gotchas.
- Auto-tagging / classification via LLM on save (consistent type + tags).
- Query rewriting / HyDE: expand the query before searching to improve recall.
- Usage telemetry: record which entries are retrieved and whether they were useful → prune dead knowledge (the ones never used in 6 months).
- Diary rollups: weekly/monthly summaries auto-generated from the daily diaries. The high-level changelog maintains itself.
- Eval harness: a set of "golden queries" to measure retrieval quality over time (relevance regressions when changing model/chunking).
- Citations/provenance: every agent answer references the slug + path of the entry it used. Traceability.
- Automated evergreen CLAUDE.md: a validator that rejects volatile data (LOC, line numbers) in CLAUDE.md, as quack-brain's rule 7 already requires.
9. Roadmap by phases
- Phase 0 — Scaffold + full hybrid retrieval (Cargo workspace with 4 crates:
pinky-core,pinky,pinky-mcp,pinky-hooks). Data model + SQLitestorewith FTS5 and sqlite-vec +fastembed(multilingual-e5-small) + frontmatter parsing + incremental reindex + hybrid search (BM25 + vector, RRF). Embeddings from day 1, no text-only stage. - Phase 1 — MCP + hooks:
pinky-mcpwithbrain_search/brain_save/brain_stats. The 4 thin hooks (SessionStart/PreRead/PreWrite/Stop) leaning on the core/MCP. - Phase 2 — Diary + rollups: automatic diary in the Stop hook, weekly/ monthly rollups,
// Brain:breadcrumbs → backlinks. - Phase 3 — Quality improvements: semantic dedup, staleness decay, usage telemetry, cross-encoder rerank, golden-query eval harness.
10. Scalability and performance (2-year horizon)
- Real volume: several projects × daily diary × gotchas ≈ tens of thousands of entries in 2 years. That's little data: SQLite handles millions of rows and vector search over tens of thousands of vectors is trivial (<10ms). The bottleneck isn't the volume, it's the organization and the retrieval.
- Incremental re-index by
content_hash: only what changed gets re-embedded. - On-disk embeddings cache to avoid recomputing.
- Partitioning by scope/project to bound searches.
- WAL mode in SQLite for concurrent reads (CLI + MCP + hooks) without blocking.
11. Risks and mitigations
- Markdown ↔ index drift → the index is disposable + idempotent reindex + hash.
- Multilingual embedding quality → eval harness measures regressions (improvement 8).
- Knowledge noise (garbage gets in) → the 4 save conditions + dedup + pruning telemetry.
- Embedding-model lock-in → version the model in the chunk metadata; global reindex when changing it.