Playing with AI on my own machine, revisited

A three-part series on building a local AI knowledge engine for digital health policy — privately, on my MacBook. Posted in During the Meanwhile, following up on the Synology experiment from November 2024.

Part 1 — From a failed NAS to a working MacBook

Some of you may remember my earlier attempt to run a local AI instance on my Synology NAS. It failed, spectacularly, because the Celeron CPU in the DS920+ doesn’t support AVX2 instructions and the whole vector-database stack that modern RAG systems depend on just refuses to run without them. I wrote it up, shrugged, and shelved the idea.

Well. A new MacBook Air arrived on my desk. The M4 one, with 24GB of unified memory. And suddenly the hardware excuse was gone.

So I picked the project back up. But this time with a much clearer goal than “let’s see if I can run an LLM at home.” This time I actually had a real problem to solve.

The problem: keeping up with the world, manually

My day job at the Ministry of Health is in international digital health. A big part of that is staying on top of what other countries are doing — what policies they’re passing, what standards they’re adopting, where they’re ahead of us, where they’re behind, and where they’re taking an entirely different route. I read a lot of policy documents. PDFs, white papers, ministerial decrees, European directives, WHO frameworks, OECD reports. In Dutch, English, French, German. Sometimes I want to answer a question like “what is the current state of interoperability legislation in France, and how does it compare to the Dutch WEGIZ?” and I have to go dig through folders of documents to construct the answer.

This is the kind of thing an AI should be good at. And it is — but not the cloud kind. I am not going to paste ministry working documents into ChatGPT. I am not going to paste them into Claude either, or into Gemini, or into any other service whose data policy I don’t fully control. (Quick plug for my earlier series on data sovereignty: the whole point is to not feed trusted-third-party algorithms with data I don’t want to give them.)

So: it has to be local. Fully local. The documents stay on my machine. The analysis stays on my machine. No network calls during normal use. The only time my laptop is allowed to reach out to the internet is when I ask it to check whether one of my tools needs an update — and even then it’s only hitting the GitHub releases API, not sending my documents anywhere.

Why the MacBook Air M4 actually works

The short version: Apple Silicon’s unified memory architecture is quietly brilliant for running quantised language models. The M4 chip with 24GB of RAM gives me about 200 GB/s of memory bandwidth, which is why a 14-billion-parameter model in Q4 quantisation runs faster on this thing than it would on a comparably-priced discrete GPU. And it does it without spinning up fans, without drawing 300W, and without my wife asking what I’m mining in the study.

The memory budget works out like this:

qwen2.5:14b — the main analytical model, multilingual, handles NL/EN/FR/DE natively — about 9 GB
phi3.5:3.8b — a smaller, faster model I use for mechanical extraction tasks — about 2.3 GB
nomic-embed-text — embeddings for comparing documents to each other — about 300 MB
Supporting stuff (the graph database, the SQLite files, Python, AnythingLLM itself) — maybe 5 GB
Remaining headroom: about 7 GB

So we’re comfortably inside the envelope. With a bit to spare for the browser tabs I’ll inevitably leave open.

What I actually want to build

I want a library. A proper, old-fashioned library, with a bit of a cataloguing obsession, but run by a slightly distracted librarian who reads everything that comes in and writes down what it’s about — and who, crucially, remembers what the Dutch position is on every topic, so that every new document gets silently compared against it. Over time this library becomes not just a pile of documents but a structured picture of what each country thinks about each aspect of digital health, and where they agree or diverge from us.

I’ve given it a name: the Global Digital Health Library, or GDHL. (Yes, I know, I am very good at naming things. Next week I’ll name my plants.) It’s named after the shelves of policy documents I already have, physically and digitally, and the ambition to do something more useful with them than just file them.

What’s coming next

In Part 2 I’ll walk through the architecture. There are two interesting ideas at the core of it, borrowed from recent research: Cog-RAG, which is a way of teaching the retrieval step to think before it searches, and Mymory, which is a way of giving the system a form of persistent memory with governance attached — so that over time it doesn’t just accumulate noise, but actually learns which positions are authoritative and flags when something new contradicts them. I’ll try to explain both without resorting to jargon, though I’ll probably fail.

In Part 3 I’ll reflect on what I’m learning from actually using this thing. Which, as always when I build anything for myself, is turning out to be a lot about how policies actually work, much more than about how the code works.

Stay tuned.