This is the second in a multi-part series on building a local AI knowledge engine for digital health policy. Part 1 covered the why and the hardware. This part covers the how.
I promised in Part 1 that I’d explain the architecture without resorting to jargon, then immediately admitted I’d probably fail. I’m going to try anyway. The two ideas at the core of this project — Cog-RAG and Mymory — are genuinely interesting, and I think they’re worth understanding even if you’re not planning to build something similar yourself. Because they reveal something about how AI systems can be made to be careful, rather than just fast.
The problem with ordinary RAG
Most AI systems that work with documents use something called RAG — Retrieval Augmented Generation. The idea is simple: when you ask a question, the system searches your document collection for relevant chunks, stuffs those chunks into the prompt as context, and the language model generates an answer based on what it found. It works reasonably well for simple questions. It fails in interesting ways for complex ones.
The failure mode I care most about is this: the system finds documents that are about the right topic, but doesn’t distinguish between a binding legal requirement and a think-tank blog post suggesting that someone should probably consider doing something about it. In digital health policy, that distinction is not incidental. The difference between “the WEGIZ requires certified systems for data exchange” and “a 2023 WHO working paper recommends that countries consider certification frameworks” is the difference between a hard constraint and a directional aspiration. Any useful analysis has to hold those apart.
Standard RAG doesn’t. It retrieves by semantic similarity, which is about meaning — and “certification frameworks” has similar meaning in both sentences. The authority of the source is invisible to the embedding.
Cog-RAG: thinking before searching
The approach I’ve borrowed from a 2024 arXiv paper is called Cog-RAG — Cognitive RAG. The key idea is that retrieval should happen in two stages, not one. In the first stage, the model thinks about the question: what kind of source would actually answer this? What topics does this touch? What authority level matters here? In the second stage, it uses those inferences to run a more targeted search — not just “find documents about certification” but “find documents about certification that are legislative in nature and are Dutch baseline anchors.”
In practice this means building a structured retrieval system that can query different layers separately. For the GDHL, those layers are: the legal graph (Dutch law, article by article), the policy document corpus (what countries say about what they’re doing), and the candidate ontology terms (emerging concepts that don’t yet have a home in the taxonomy). The bridge between AnythingLLM and these layers is a small FastAPI service that orchestrates the two-stage retrieval and assembles the context before it reaches the model.
It’s more plumbing than magic. But the plumbing matters.
Mymory: a library with institutional memory
The second idea is more interesting, and harder to explain. I found it in a Medium post by Micheal Lanham that pointed to a GitHub project called Mymory — it’s a small framework for giving AI systems a form of governed memory, structured as a directed acyclic graph. The idea is that instead of just storing documents, you store relationships between documents over time — and you flag when new information contradicts what the system believed before.
For my purposes, this translates into something concrete: the GDHL maintains a set of Dutch baseline anchors — the canonical positions, the binding legal requirements, the established NL policy stances. Every new document that comes in — a French strategy, an OECD report, an Australian interoperability roadmap — gets compared against those anchors. If it diverges significantly, the system flags it. Not necessarily because the foreign position is wrong, but because the divergence is informative. That’s where the interesting policy questions live.
The memory component also tracks something I’m calling entropy: when a series of new documents starts pulling the semantic centre of a topic cluster away from the Dutch baseline, it means something is changing in the international landscape. That’s a signal worth surfacing, not averaging away.
Five laws, 215 articles
One of the design decisions I’m most pleased with — in retrospect, it was obvious, but it didn’t start out that way — is treating Dutch healthcare law as a first-class element of the knowledge base, not just another set of documents.
The GDHL ingests five Dutch healthcare acts directly from their XML source files on wetten.overheid.nl: the Wgbo (patient rights), Wabvpz (health data processing), Wkkgz (quality and complaints), the UAVG (the Dutch implementation of the GDPR), and the WEGIZ (electronic data exchange in care). That’s 215 individual articles, each stored and indexed separately, with all their cross-references between them mapped as edges in a graph.
This means that when the system encounters a phrase like “certified according to the applicable standard for data exchange” in a foreign document, it can trace that back to WEGIZ Article 1.4 and from there to Wkkgz Article 1 and the definition of “good care” on which the whole chain depends. The legal architecture of Dutch health data policy is not just described in the system — it’s modelled in it.
The cross-reference graph, by the way, turns out to be informative on its own. WEGIZ references 12 other Dutch laws. Wabvpz references WEGIZ, Wkkgz, and the UAVG. The UAVG references the GDPR directly (an EU regulation, so it sits outside our Dutch law graph). You can see, just from the reference structure, that Dutch health data law is built in layers: patient rights at the base, data processing rules on top of that, quality obligations on top of those, and WEGIZ sitting at the intersection of all of them, mandating the how of the exchange that the other laws mandate the why of.
Two models, not one
The system uses two language models, for different tasks. phi3.5:3.8b — a small, fast model — handles the mechanical extraction work: reading a document and producing structured metadata about it. What country is this from? What year? What topics does it cover? What is the document type? It handles this well enough, quickly enough, that I can process hundreds of documents in a reasonable time.
qwen2.5:14b — a larger, multilingual model — handles the harder tasks: finding policy concepts that aren’t yet in my taxonomy, comparing documents against the Dutch baseline for divergence, and producing the final answers to policy questions. It’s slower, but the quality difference is real. I spent an afternoon trying to get the small model to do what the big one does. It can’t.
There’s also nomic-embed-text, which converts documents to numerical vectors for semantic comparison. This is the component that makes it possible to ask “show me all documents similar to this WEGIZ implementation guide” without having to specify exactly what you’re looking for.
All three models run locally, on the MacBook, on a chip that charges itself from a wall socket that draws roughly what a bright light bulb draws. I find this mildly astonishing every time I run a query.
Privacy by construction
I want to be explicit about this, because it’s the non-negotiable design constraint that everything else is built around. The system is, technically, incapable of sending my documents to an external server. The Ollama client that handles all model calls is explicitly bound to localhost:11434 in the configuration. If that address is ever changed to point at a remote host, the scripts refuse to start. It’s not just a policy — it’s a check that runs at startup, every time.
The only network call the system makes is to the GitHub releases API, once, when you run the setup script, to check whether your tools are current. That call contains no document content. Everything else stays on the machine.
This matters for the documents I’m working with. Some of them are working papers. Some are draft ministerial letters. Some are internal analyses that haven’t been published. They’re not classified, but they’re not meant to be fed into a commercial AI service either. The local-only constraint is what makes it possible to put all of them in the system without having a conversation with the security officer first.
What’s next
Part 3 is about what I actually learned from running this thing against 155 Dutch policy documents. There are some surprises. The most interesting finding isn’t about the policy documents themselves — it’s about the gap between what we thought we knew and what the system found. That’s for next time.











