Problem
Two related needs led here. The CFDLab repo grew an Obsidian vault alongside the codebase — concepts, derivations, exercises, chapter maps, all cross-linked — and the graph is now dense enough that traversal-and-search alone leaves natural-language questions on the table (“where does the modified-PDE argument show up across both books?”; “which exercises depend on Roe averaging?”). Retrieval-augmented generation is the obvious tool for that, but most demos circulating online hand the model and the index off to a hosted API. The point of doing it locally is the inverse: the model that reads the corpus and the model that answers the query both have to be visible, swappable, and private.
Approach
The corpus is the CFDLab vault — an Obsidian workspace I maintain alongside the codebase. Roughly two hundred notes spread across concepts/, derivations/, exercises/, chapters/, appendices/, and maps/, with explicit cross-links and a coverage matrix that pins each chapter to the rendered artefacts in the repo. Cluster maps and an atlas layer sit above the notes for orientation; a v2 populous-cluster supergraph renders the whole thing as an HTML launcher. The vault is structured well enough to answer most queries by hand. RAG is for the questions that span clusters or want a synthesis the graph view can’t render.
The retrieval stack is LightRAG (HKUDS) — a knowledge-graph RAG framework that extracts entities and relations from documents and serves them through five retrieval modes (local, global, hybrid, mix, naive). Storage is pluggable; LLM and embedding providers are pluggable. I deployed it locally with everything pointed at on-box models: extraction, embeddings, retrieval, all running on the same machine. FastAPI server, React WebUI, JSON + NetworkX for the entity / relation graph, a small vector index for chunks. Reproduce scripts (Step_0 → Step_3) rebuild the pipeline against a known corpus on demand, so a regression in any layer surfaces against a checked baseline rather than as a vibe.
Result
The CFD curriculum now has three views that point at the same content. The codebase (CFDLab) runs every result the books describe. The vault maps every concept and derivation, with the coverage matrix as the spine. The retrieval layer makes either of those queryable in plain language, with no external API call — ingest the vault, build the entity / relation graph, query in any of the supported modes, and watch the retrieval traces in the log.
The earned thing is not “a RAG that works.” It is the loop where the model doing the extraction and the model answering the query are both running locally, and the corpus they read is one I built and maintain. Same discipline as the multi-node cluster: own every layer, or you cannot trust the workflow that runs on top.
What I’d do differently
Tune to the corpus earlier. Most of the early friction was in getting the framework’s defaults to behave well at a generic scale; the late wins were in chunking, prompt templates, and embedding choices specific to the vault’s note shape (short concept notes, longer derivations, exercise / brief pairs). The pipeline narrows itself faster when the corpus is treated as a fixed input from day one and the framework parameters are the variables.