Hive Mind: Build the Data Layer First
Most AI agents work from the open internet. What becomes possible when they work from a graph of data you own, rank, and curate. A daily personalized podcast, a research API ready for any draft, and four other unlocks. The data layer matters more than the model.
Every morning at 7am a podcast plays in my AirPods that did not exist last night. Two hosts I picked, on topics I asked them to follow, built from articles I starred and podcasts I actually listened to. Fifteen minutes long. Tomorrow’s is already being written.
This is what becomes possible when AI works from a graph of data you own.
Most AI agents work from the open internet. They scrape generic results, summarize them, and hand back roughly the same five PDF chunks the next person querying the same thing got. What you have personally read, starred, trusted, and built a position on does not enter the loop. The model is downstream of the data, and the data is generic.
A flat pile of your PDFs in a vector store is barely better. It can find similar passages. It cannot tell you which ones you have already endorsed, which authors you trust, or which two highlights from different sources are about the same idea.
The solution
A graph of curated entities and editorial scores, queryable by an AI. Each ingested item becomes a node. Each highlight extracted from it becomes a child node with an editorial-quality score. Tags, people, books, concepts, threat actors are their own nodes, connected to the highlights that mention them. Vector indexes sit alongside the structure, so semantic search returns hits within the graph instead of beside it.
The kind of question this enables, in one HTTP call:
Show me the highlights about commercial spyware in Europe that I have personally upvoted, weighted by the tags I currently care about, deduplicated across cross-corpus similars, only from the last 30 days, ranked by relevance to my actual query, not by editorial quality alone.
The response, abridged:
[9/10] Citizen Lab · First Forensic Confirmation of Paragon’s iOS Spyware (2026-03-05) · 👍 · tags: commercial-spyware, italy, journalism “Italian prosecutors confirmed Cancellato’s phone was compromised by Paragon spyware…”
[9/10] Risky Bulletin · DigiCert hacked with malicious screensaver (2026-05-04) · tags: dprk, supply-chain “North Korean hackers have stolen $577 million in crypto so far this year, accounting for three-quarters of all crypto stolen in 2026…”
[8/10] Defensive Security Podcast · Episode 345: Axios maintainer compromise (2026-04-22) · 👍 · tags: dprk, supply-chain, social-engineering “The maintainer of the Axios package was the victim of a very elaborate spear-phishing campaign perpetrated by North Korean hackers…”
Every clause is a graph traversal. “Highlights I have upvoted” walks an :UPVOTED edge. “Weighted by tags I care about” walks :TAGGED_AS edges with weights. “Cross-corpus similars” is a vector neighborhood within the highlight subgraph. The graph runs them in one traversal; SQL fakes them with joins. The reason to choose a graph here is not speed. It is that adding a new edge type, say :CITED_IN to track where each highlight ended up in my own writing, is one schema migration instead of three.
What you can do with one
Daily personalized podcast. The most felt unlock. Two hosts I picked, Vera and Étienne, deliver a fifteen-minute brief every morning built from highlights with editorial-quality scores at or above 7 from the previous twenty-four hours. The topics weight against my current taxonomy. Vera anchors. Étienne pushes back. They cover the things I would have missed if I were only reading my own feed.
Research API for any draft. “Pull me a research pack on commercial spyware in Europe; rank it by my own taste; only the last thirty days.” Returns markdown ready to paste into a draft. The companion piece OpSec for Adults was substantially built from /api/research-pack calls against this graph.
The brief gives you the disagreement, not the consensus. The daily audio script seats two or three perspectives from a fixed council based on the day’s highlights. When sources contradict each other, the brief delivers both with the contradiction in the script. It is the closest thing I have to a steel-manned daily disagreement on every topic I follow.
Cross-corpus dedup. When the same idea shows up in three different newsletters in one week, the graph already knows. The brief surfaces it once with the cluster of sources, instead of three times in three different places. Reading less, noticing more.
Memory across time. A topic blows up. I can replay last year’s brief on it. I can pull every highlight on the topic from the last twelve months and see what I had already been told about it that I forgot. The graph keeps a longer memory than I do.
Editorial taste. Tag weights, council seating, scoring model, my upvotes. They encode what I find durable. The apparatus is a clone of my judgment that I can outsource the daily reading to. Over time, it gets better at being me.
The plumbing
Hive Mind runs on Neo4j with vector indexes co-located alongside the structure. DeepSeek extracts and scores highlights, tags them against my taxonomy, and composes the daily audio script. ElevenLabs renders the two-host brief. The HTTP API exposes the queries above as named endpoints. The full architecture is in an internal ARCHITECTURE.md. The rest of this piece is about what the system unlocks, not how it works.
What I learned
An editorial-quality score is not a topic-relevance score. I sorted by editorial quality even when the user had supplied a topic. Result: high-confidence off-topic results when the corpus did not have the topic. The fix was to compute relevance separately and combine the two scores. The lesson is that an editorial score and a relevance score are different numbers and your API should never silently conflate them.
Skills should document the contract, not the implementation. I wrote skill cards that explained how to query the graph via SSH and cypher-shell. I forgot to document the HTTP API. A fresh agent dropped into the apparatus cold, followed the skill cards into SSH, got blocked. The API was three minutes away by grep but the skills did not point there. Discovery surfaces document the contract.
Build one
If you want to build something like this, the leverage move is editorial scoring at extract time. The graph is the substrate; the score is the curation. A vector store can be retrofitted with scoring; a graph cannot be retrofitted with structure. Start with the graph.
The companion piece, OpSec for Adults, is a worked example of what the API actually buys you. The Lazarus quote, the Signal-iOS gotcha, the EFF anchor: every cited claim came out of a /api/research-pack call against this graph. Your AI harness is downstream of your data layer. Build the data layer first.