The Loop Was Running. The Memory Wasn’t.
Episode 1 ended with the loop closing 35 tasks in 24 hours. One host. One queue. One continuous execution loop. The pragmatic version of the architecture, running. Then we watched more closely. The same DynamoDB gotcha that cost three hours on Monday appeared again on Wednesday. Different task, different file, same mistake — conditional write syntax that the documentation describes incorrectly. The agent had worked through it before. But “before” was a closed session. This session started blank. Every session starts blank. That’s not a model limitation — it’s an architectural one. The model is perfectly capable of using prior context when prior context exists. The problem is that nothing was giving it prior context. The instinct was to add a vector database. Embed session transcripts, build a retrieval pipeline, query semantically before each task. It would work. It would also mean running another service, maintaining embeddings, and adding latency to every task kickoff. We looked at what we already had instead.The Insight
The task queue is GitHub Issues. Every closed issue is a record of completed work. Comments capture what happened — what was tried, what failed, what the final approach was. The information was already there, accumulated over weeks of the loop running. The gap wasn’t that the information didn’t exist. It was that the information wasn’t structured for retrieval. A vector database solves a retrieval problem. But before retrieval comes structure. Unstructured prose needs embeddings and semantic search to pull signal from noise. Structured evidence needs only a keyword search against a typed schema. The reframe: make task outcomes structured as they’re recorded, and the retrieval problem becomes simple.The Bead Format
We call the structured outcome record a “bead” — a comment written to the GitHub Issue when the task closes. Five fields, each earning its place:outcome — what was actually delivered. One sentence. Verifiable.
friction_points — this is the most valuable field. The specific things that
slowed the work down: wrong documentation, surprising API behavior, a gotcha that
cost hours. These are the corrections that otherwise vanish at session end.
patterns_used — the named approaches that applied. Creates a vocabulary for
pattern-based retrieval: “find tasks that used idempotency-key.”
tags — technology and domain tags for broader filtering.
The friction_points field captures what didn’t work on the way to what did.
Most knowledge systems optimize for capturing solutions. Beads capture the
wrong turns too — and that’s the information that most changes how you’d approach
the next similar task.
The Retrieval Layer
Once beads are structured, the retrieval question becomes straightforward. An MCP (Model Context Protocol) tool is the right interface: callable from within the agent session, adds zero new infrastructure, returns structured context in milliseconds. The toolsearch_past_tasks takes a query and optional filters and returns the
N most relevant beads. No embeddings. No vector index. Keyword overlap against
friction_points and patterns_used, with recency as tiebreaker.
search_past_tasks first. The
DynamoDB conditional write gotcha is in the bead from GH-142. The next agent
that touches DynamoDB sees it before writing a line.
Where Beads Fit: Three Tiers
Beads solve a specific tier of the memory problem. Worth being precise about which one. Tier 1 — Non-negotiable rules that apply to every task. Never use floats for money. Scope every query by tenant. These live as enforced constraints in Claude Code hooks, not documentation that might be skipped. Tier 2 — Architectural decisions, ADRs, established patterns. “We use conditional writes for idempotency.” Included at session start as project context. Tier 3 — The episodic record: what actually happened on past tasks, which approaches failed, what the friction points were. This is where beads live. The instinct to add a vector database usually targets Tier 3. Beads address the same tier with less infrastructure — the tradeoff is retrieval sophistication vs operational simplicity. For a bounded task domain running against one codebase with recurring task types, keyword retrieval over structured beads is enough.The Real Cost
The tradeoff is manual curation. Beads don’t write themselves. The executor agent has to write one when closing each task. This is real. But it’s also a forcing function. Writing a good bead means articulating what happened — which has value independent of future retrieval. The agent that writes the bead is encoding its own correction for the next agent that encounters the same situation. If you want beads to populate automatically from session transcripts, that’s when a purpose-built memory system starts to justify itself. For the current loop, the executor writes them as part of the close-out sequence, and the quality is consistently high because the context is fresh.What Changed
The loop was closing tasks before beads. It’s closing them faster now. Not because the agents are smarter — because they start informed. The DynamoDB gotcha that cost three hours the first time costs twenty minutes the second time because the friction point is surfaced before the first write. Pre-task context instead of post-task regret. That’s the whole thing.What’s Next
The loop has memory now. But it still doesn’t know what it’s about to break. A task like “add a parameter to this function” looks small. It isn’t — not in a codebase where 47 modules call that function. The agent discovers the scope reactively: write, compile, find 23 errors, fix each one. Six compiler cycles. Forty-five minutes. Episode 3 covers the architecture we’re building to give the loop blast radius awareness before the first keystroke — a proactive impact graph that makes the reactive discovery loop unnecessary.Episode 3: The Agent That Couldn't See What It Was Breaking
How a code intelligence layer — Tree-sitter, KuzuDB, MCP — gives agents proactive impact awareness before they write.
All content represents personal learning from personal and side projects. Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.