Episode 3: The Agent That Couldn't See What It Was Breaking

This is Episode 3 of the Autonomous Dev Org series — an honest account of building a development organization where AI handles implementation and humans handle direction. Each episode covers what we attempted, what broke, and what we learned.

The Pattern We Started Noticing

Episode 2 gave the loop memory. Agents now start tasks with context from similar past work — friction points, patterns, gotchas already documented. That fixed one class of repeated mistakes. A different class persisted. As the loop took on larger tasks — interface changes, service refactors, shared utility updates — we’d watch the same sequence play out. The agent edits a function signature. Clean implementation. Correct logic. Then it runs the build. Twenty-three errors across fifteen files. The agent didn’t make a bad decision. It made a decision without knowing the scope of what it was touching. The function it changed was called by 47 modules. Three of them used positional argument ordering that no longer matched. One was a test file that mocked the old signature. One was a macro in another crate that depended on the parameter count. None of that was visible to the agent when it made the change. It found out the way you’d find out if you were new to the codebase and nobody had told you about the dependents: by compiling and watching things break. This is the reactive loop: change → compile → find breakage → fix → compile again. For a human engineer it’s annoying. For an autonomous loop running without human review, it’s a trust problem.

Why This Is an Information Problem, Not an Intelligence Problem

The reactive loop isn’t a model capability failure. The model can reason about change impact when it has the relevant context. The problem is that the relevant context — the full dependency graph of what calls what, what types flow where, what contracts exist across module boundaries — isn’t available to the agent when it’s deciding where to apply a change. You can partially address this by providing more files upfront: “here are the callers of this function.” But that requires someone to know which files to include. In a codebase with dozens of repositories and hundreds of modules, nobody has that complete picture ready to hand. The information exists in the codebase. It just isn’t in a form the agent can query. The reactive loop is a symptom of a missing capability: the agent doesn’t have a structured representation of the codebase it’s operating on.

What Proactive Blast Radius Awareness Looks Like

The difference in practice: Same change. Same agent. Different starting information. With blast radius awareness, before writing anything, the agent calls impact(). The tool returns: 47 direct callers, 8 test files, 3 macro dependents, highest risk in billing_handler.rs, auth_middleware.rs, macro/derive.rs. The agent reviews those files first, drafts a migration approach that handles all callers, and makes the change in one pass. Two compiler cycles instead of six. Eighteen minutes instead of forty-five. More importantly: the agent made a decision about how to sequence the work based on actual knowledge of the codebase, not reactive discovery.

The Architecture

Three components: a structured code representation, a graph query layer, and an MCP interface for the agent.

Component 1: Tree-sitter AST Parsing

Tree-sitter produces concrete syntax trees for source code. It’s fast (incremental on file changes), language-agnostic, and queryable. For each file, the parser extracts symbols and their relationships:

def extract_symbols(file_path: str, source: str) -> list[Symbol]:
    """
    Parse source file and extract all symbols with their relationships.
    Returns: functions, types, imports, trait implementations, call sites.
    """
    tree = ts_parser.parse(source.encode())
    symbols = []

    for node in tree.root_node.children:
        if node.type == "function_item":
            symbols.append(Symbol(
                kind="function",
                name=extract_name(node),
                file=file_path,
                line=node.start_point[0],
                calls=extract_call_sites(node),
                signature=extract_signature(node),
            ))
        elif node.type in ("struct_item", "enum_item", "trait_item"):
            symbols.append(Symbol(
                kind=node.type.replace("_item", ""),
                name=extract_name(node),
                file=file_path,
                line=node.start_point[0],
            ))

    return symbols

Component 2: KuzuDB Graph

KuzuDB is an embedded graph database — no separate service, runs in-process. Nodes are symbols (functions, types, modules). Edges are relationships (calls, implements, imports).

CREATE NODE TABLE Function(
    name STRING,
    file STRING,
    line INT64,
    signature STRING,
    PRIMARY KEY(name, file)
);

CREATE NODE TABLE Type(
    name STRING,
    kind STRING,
    file STRING,
    PRIMARY KEY(name, file)
);

CREATE REL TABLE CALLS(FROM Function TO Function);
CREATE REL TABLE IMPLEMENTS(FROM Type TO Type);
CREATE REL TABLE USES_TYPE(FROM Function TO Type);

Once populated, “what calls this function transitively” is a graph traversal:

-- Direct and transitive callers of handle_request
MATCH path = (caller:Function)-[:CALLS*1..3]->(target:Function {name: "handle_request"})
RETURN caller.name, caller.file, length(path) as depth
ORDER BY depth;

Component 3: MCP Server

Four tools exposed to the agent:

@mcp_tool
def impact(symbol_name: str) -> ImpactReport:
    """
    Return the blast radius of changing a symbol.
    Includes direct callers, transitive dependents, test coverage, and risk assessment.
    """

@mcp_tool
def context(file_path: str, symbol_name: str) -> SymbolContext:
    """
    Return full context for a symbol: callers, dependencies, tests.
    Use before making a change to understand what you're touching.
    """

@mcp_tool
def query(cypher: str) -> list[dict]:
    """
    Run a raw graph query. For complex dependency questions.
    """

@mcp_tool
def detect_changes(since: str) -> list[ChangedSymbol]:
    """
    Return symbols that changed since a git ref, with their downstream impacts.
    Use at the end of a session to verify change scope matched expectations.
    """

Pre-change assessment (impact), targeted context (context), flexible queries (query), post-change verification (detect_changes). The loop calls impact before planning, detect_changes before closing.

Why This Changes the Trust Equation

The reactive loop isn’t just slow. It actively undermines confidence in autonomous operation. When an agent makes cascading errors — edits a function, breaks 15 callers, generates fixes that introduce new issues — it’s not because the model reasoned badly. It’s because the model reasoned without the information it needed. The failure mode isn’t intelligence; it’s information architecture. This matters for where you draw the human oversight boundary. If agents fail because they reason poorly, the answer is better models. If they fail because they reason without crucial context, the answer is better tooling. Blast radius awareness addresses the second category. An agent that can ask “how many things break if I change this signature” before changing it can operate with materially less oversight than one that discovers breakage reactively.

The enforcement layer described in Claude Code hooks addresses a complementary failure mode — agents that violate architectural constraints rather than misscope changes. The two layers target different parts of the autonomy problem.

Constraints Worth Knowing

The design is clear. A few real constraints in the implementation: Graph freshness. The graph needs to stay synchronized with the codebase. The right approach is incremental: watch for file changes (or hook into git), re-parse modified files, update affected nodes and edges. Full rebuild on every query is too slow for large codebases. Multi-language support. Tree-sitter supports most major languages, but the extraction logic differs by language. A Rust codebase and a TypeScript codebase need different parsers. Manageable, but not free. Cross-repo resolution. In a polyrepo structure, a function call might cross repository boundaries. Full blast radius awareness across repos requires a shared graph spanning repositories — more complex to maintain, but necessary for accurate impact analysis in a distributed codebase. These are engineering problems, not architecture problems. The design handles them at the cost of implementation complexity.

What’s Next

The loop has memory (Episode 2). It has blast radius awareness (this episode). The next phase is proving the product actually works — not just that the issue count dropped. End-to-end verification: using the product as a real user, across real environments, with real data. The loop will miss things. Production-grade software isn’t correct implementations in isolation — it’s correct behavior under the full combination of real constraints. That’s what the coming episodes will close.

All content represents personal learning from personal and side projects. Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.

Building with AI

Autonomous Dev Org

​The Pattern We Started Noticing

​Why This Is an Information Problem, Not an Intelligence Problem

​What Proactive Blast Radius Awareness Looks Like

​The Architecture

​Component 1: Tree-sitter AST Parsing

​Component 2: KuzuDB Graph

​Component 3: MCP Server

​Why This Changes the Trust Equation

​Constraints Worth Knowing

​What’s Next

The Pattern We Started Noticing

Why This Is an Information Problem, Not an Intelligence Problem

What Proactive Blast Radius Awareness Looks Like

The Architecture

Component 1: Tree-sitter AST Parsing

Component 2: KuzuDB Graph

Component 3: MCP Server

Why This Changes the Trust Equation

Constraints Worth Knowing

What’s Next