Pinky Brain — Architecture plan (local)

Knowledge management system for agents + humans, scalable to several projects over 2+ years. Evolution of quack-brain. Decision made: LOCAL mode. Embedded SQLite, no server, no network. No storage abstraction (a single real backend → don't abstract it, YAGNI).

1. Vision and goals

Readable and versioned source of truth: knowledge lives in markdown + git. The agent edits it with its native tools and a human reviews it with git diff.
Real retrieval: hybrid search (full-text + semantic), not manual grep.
Multi-project: project knowledge + global knowledge across projects.
Durable 2+ years: the index is 100% regenerable from the markdown; no critical data lives only in the DB.
Zero ops: one binary + one brain.db file. No Docker, no daemon, offline.

2. Design principles

Markdown = source of truth. SQLite = derived, disposable index. If the DB gets corrupted → pinky reindex rebuilds it from the .md files.
SOLID without over-engineering. A concrete store module over SQLite, **with no abstraction trait**: there is no second backend, so nothing gets abstracted. If one ever appeared (unlikely), the trait gets extracted then — not before.
Semantic pull over blind push. The agent searches when it needs to (MCP tool), instead of injecting all the context on every hook.
Offline-first. Works without network. Local embeddings. The network only comes into the git pull/push of the global brain, and it's optional.
Append-only where possible (diary) so git merges without conflicts.

3. Architecture

┌─ Fuente de verdad (git) ─────────────────────────────┐
│  <proyecto>/documentation/*.md                       │
│  ~/.pinky/brain/*.md            (global cross-proyecto)│
└───────────────────────┬──────────────────────────────┘
                        │  notify (file watcher)
                        ▼
┌─ pinky-core (lib Rust) ──────────────────────────────┐
│  parse(frontmatter) → chunk → embed → upsert         │
│  search híbrido (BM25 + vector, fusión RRF) → rerank  │
│  módulo `store`: SQLite (sqlite-vec + FTS5)          │
└───────────────────────┬──────────────────────────────┘
        ┌───────────────┼────────────────┐
        ▼               ▼                 ▼
   pinky (CLI)    pinky-mcp (server)   pinky-hooks
   reindex/search  brain_search tool   SessionStart/Stop

pinky-core: library with all the logic (parse, chunk, embed, search, store).
store: concrete module over rusqlite + sqlite-vec + FTS5. No trait.
pinky-mcp: exposes brain_search, brain_save, brain_stats as an MCP server. The agent does semantic pull instead of grep.
pinky CLI: reindex, search, doctor, stats, gc.
pinky-hooks: the 4 hooks from quack-brain (SessionStart/PreRead/PreWrite/Stop), thin: the heavy retrieval is delegated to the core/MCP instead of injecting everything.

4. Data model (SQLite)

The YAML frontmatter is mapped to indexed metadata for filtering:

entry(
  id            TEXT PK,        -- hash estable del path
  path          TEXT,           -- ruta del .md (relativa al root)
  scope         TEXT,           -- 'project:<name>' | 'global'
  type          TEXT,           -- gotcha | pattern | bug | decision | diary | guide
  project       TEXT,
  tags          TEXT,           -- JSON array
  created       TEXT,           -- ISO date
  last_verified TEXT,           -- para staleness/decay (§9)
  title         TEXT,
  body          TEXT,
  content_hash  TEXT            -- re-index incremental (skip si no cambió)
)
chunk_fts        FTS5 virtual table (text)        -- BM25
chunk_vec        sqlite-vec virtual table (embedding float[384])
chunk(
  id, entry_id, ord, text                          -- une fts + vec con metadata
)
backlink(           -- grafo código ↔ conocimiento desde `// Brain: {slug}`
  entry_id, file_path, line, repo
)
usage(              -- telemetría: qué entradas se recuperan/usan de verdad
  entry_id, retrieved_at, query, was_useful
)

5. Technology stack

Need	Choice	Why
Language	Rust	<50ms latency in hooks, one binary, no runtime
Index	`rusqlite` + `sqlite-vec` + FTS5	One file, hybrid in a single query, in-process
Embeddings	`fastembed` (ONNX, `multilingual-e5-small`)	Local, no API, no cost per hook, ES+IT
Rerank (optional)	cross-encoder ONNX (bge-reranker-base)	Precision after the hybrid retrieval
Frontmatter	`gray_matter`	YAML + body in one step
File watching	`notify`	Incremental re-index
MCP	`rmcp` (SDK oficial Rust)	`brain_search` as a tool
Async runtime	`tokio`	MCP server + watcher

Multilingual embeddings: multilingual-e5-small (384 dims) covers Spanish and Italian (SGSVP courses) with no cost per query. If more recall is needed down the road, bge-m3. The model is versioned in the chunk metadata → global reindex when changing it.

6. Hybrid retrieval

Not vector alone. BM25 (FTS5) + vector (sqlite-vec), fused with Reciprocal Rank Fusion, then optional rerank:

BM25: exact terms (slugs, function names, error strings).
Vector: semantic similarity ("something similar to this problem").
RRF: fuses both rankings without calibrating weights.
Rerank (cross-encoder) over the top-N for final precision.
Metadata filters: by scope, project, type, tags, freshness.

The hybrid stack (FTS5 + sqlite-vec + embeddings) is in place from Phase 0, with no intermediate text-only stages.

7. Sync and multi-project

Project knowledge: lives in the project's repo (documentation/), versioned with the code.
Global knowledge: dedicated git repo cloned into ~/.pinky/brain. Sync between your machines = git pull/push (this is NOT "shared mode": it's still local, git is just the transport). Append-only in diary → conflict-free merges.
The brain.db index is never committed; it's rebuilt on each machine.

8. Improvements I'd like to make (over quack-brain)

Staleness decay: penalize in the ranking entries with old last_verified; re-verification reminders. Aging knowledge degrades on its own.
Code ↔ knowledge graph: index the // Brain: {slug} breadcrumbs as backlinks. "What code depends on this gotcha?" and vice versa.
Semantic dedup: on save, detect near-duplicate entries (cosine > threshold) and propose a merge. Keeps the brain from filling up with repeated gotchas.
Auto-tagging / classification via LLM on save (consistent type + tags).
Query rewriting / HyDE: expand the query before searching to improve recall.
Usage telemetry: record which entries are retrieved and whether they were useful → prune dead knowledge (the ones never used in 6 months).
Diary rollups: weekly/monthly summaries auto-generated from the daily diaries. The high-level changelog maintains itself.
Eval harness: a set of "golden queries" to measure retrieval quality over time (relevance regressions when changing model/chunking).
Citations/provenance: every agent answer references the slug + path of the entry it used. Traceability.
Automated evergreen CLAUDE.md: a validator that rejects volatile data (LOC, line numbers) in CLAUDE.md, as quack-brain's rule 7 already requires.

9. Roadmap by phases

Phase 0 — Scaffold + full hybrid retrieval (Cargo workspace with 4 crates: pinky-core, pinky, pinky-mcp, pinky-hooks). Data model + SQLite store with FTS5 and sqlite-vec + fastembed (multilingual-e5-small) + frontmatter parsing + incremental reindex + hybrid search (BM25 + vector, RRF). Embeddings from day 1, no text-only stage.
Phase 1 — MCP + hooks: pinky-mcp with brain_search/brain_save/brain_stats. The 4 thin hooks (SessionStart/PreRead/PreWrite/Stop) leaning on the core/MCP.
Phase 2 — Diary + rollups: automatic diary in the Stop hook, weekly/ monthly rollups, // Brain: breadcrumbs → backlinks.
Phase 3 — Quality improvements: semantic dedup, staleness decay, usage telemetry, cross-encoder rerank, golden-query eval harness.

10. Scalability and performance (2-year horizon)

Real volume: several projects × daily diary × gotchas ≈ tens of thousands of entries in 2 years. That's little data: SQLite handles millions of rows and vector search over tens of thousands of vectors is trivial (<10ms). The bottleneck isn't the volume, it's the organization and the retrieval.
Incremental re-index by content_hash: only what changed gets re-embedded.
On-disk embeddings cache to avoid recomputing.
Partitioning by scope/project to bound searches.
WAL mode in SQLite for concurrent reads (CLI + MCP + hooks) without blocking.

11. Risks and mitigations

Markdown ↔ index drift → the index is disposable + idempotent reindex + hash.
Multilingual embedding quality → eval harness measures regressions (improvement 8).
Knowledge noise (garbage gets in) → the 4 save conditions + dedup + pruning telemetry.
Embedding-model lock-in → version the model in the chunk metadata; global reindex when changing it.