Skip to main content

The Surprising Realization

After weeks of building with AI, I discovered something unexpected: AI doesn’t just accelerate existing workflows - it makes previously impossible work suddenly feasible. The work I tackled:
  • Complete end-to-end test coverage for event flows (21 test scenarios)
  • Organization model documentation (agent responsibilities, workflows, decision frameworks)
  • API visibility architecture for SDK filtering
  • Bundle/unbundle workflow state machine with examples
  • Usage metering infrastructure with aggregation pipelines
But here’s what surprised me: I wouldn’t have attempted most of this work without AI. Not because it’s technically hard, but because the effort-to-value ratio seemed wrong. Let me show you what changed.

Organization Model Documentation (The Work Nobody Does)

The surprise of the week: The most valuable output wasn’t code - it was organizational documentation. The Context: I had built significant functionality with the multi-agent workflow. But I noticed problems:
  • Evaluator sometimes made decisions outside its scope
  • Builder occasionally asked questions Evaluator should answer
  • Verification reports varied in quality
  • No clear escalation path for conflicts
The root cause: I never formally defined what each agent was responsible for. The traditional approach: Write a README explaining the workflow (maybe). What I tried instead: Create a formal organization model with AI.
Planning session for agent organization model.

Context:
- Using 3 agents: Evaluator (Opus), Builder (Sonnet), Verifier (Sonnet)
- Working well for features, but roles blur on complex issues
- Need clear boundaries, responsibilities, and escalation paths

Design an organization model with:
1. Agent roles and responsibilities (what each agent owns)
2. Decision rights (who decides what)
3. Communication protocols (how agents interact)
4. Conflict resolution (what happens when agents disagree)
5. Quality gates (when work can progress to next agent)

Model after real organizations:
- Evaluator = Principal Architect
- Builder = Senior Engineer
- Verifier = Tech Lead / Reviewer

Include failure modes and handling strategies.
AI produced something remarkable: A 35-page organizational constitution that included:

Agent Personas

Evaluator (Principal Architect)
  • Owns: Architecture decisions, technical strategy
  • Decides: Technology choices, design patterns, trade-offs
  • Cannot: Implement code, override verification failures
  • Escalates to: Human (for business decisions)
Builder (Senior Engineer)
  • Owns: Implementation quality, test coverage
  • Decides: Code structure, algorithm choice (within plan)
  • Cannot: Change architecture, skip verification
  • Escalates to: Evaluator (for plan changes)
Verifier (Tech Lead)
  • Owns: Quality standards, requirement coverage
  • Decides: Pass/conditional/fail, required fixes
  • Cannot: Implement fixes, change requirements
  • Escalates to: Human (for requirement ambiguity)

Decision Framework

Type 1 Decisions (Reversible):
  • Variable naming, code formatting
  • Test data values
  • Comment wording
  • Owner: Builder makes decision, no approval needed
Type 2 Decisions (Reversible with effort):
  • Algorithm choice (within performance constraints)
  • Error message wording
  • Test organization structure
  • Owner: Builder proposes, Verifier validates
Type 3 Decisions (Hard to reverse):
  • Database schema changes
  • API contract changes
  • Event structure changes
  • Owner: Evaluator decides, human approves
Type 4 Decisions (Irreversible):
  • Multi-tenant isolation strategy
  • Compliance approach
  • Core technology choices
  • Owner: Human decides, Evaluator advises

Quality Gates

Gate 1: Planning Complete
  • ✅ Requirements understood
  • ✅ Design options evaluated
  • ✅ Recommended approach justified
  • ✅ Human approval received
  • Then: Builder can start implementation
Gate 2: Implementation Complete
  • ✅ All plan requirements implemented
  • ✅ Four-level tests passing
  • ✅ No compiler warnings
  • ✅ Builder self-review done
  • Then: Verifier can review
Gate 3: Verification Passed
  • ✅ Requirements coverage verified
  • ✅ Test adequacy confirmed
  • ✅ Edge cases identified
  • ✅ No blocking issues
  • Then: Human final review
Gate 4: Human Approved
  • ✅ Spot-check implementation
  • ✅ Verify AI didn’t hallucinate features
  • ✅ Confirm alignment with business goals
  • Then: Merge to main

Conflict Resolution

Scenario 1: Verifier rejects implementation
  • Builder must fix issues (no debate)
  • If Builder believes rejection is wrong:
    1. Builder escalates to Evaluator
    2. Evaluator reviews both perspectives
    3. Evaluator makes final call
    4. Human can override if needed
Scenario 2: Builder disagrees with plan
  • Builder documents concern in PR comment
  • Builder proposes alternative approach
  • Evaluator reviews:
    • If minor: Evaluator approves change
    • If major: Human decides
Scenario 3: Requirement ambiguity
  • Verifier flags ambiguity (doesn’t fail review)
  • Verifier suggests clarifying questions
  • Human clarifies requirement
  • Builder updates implementation
  • Re-verification required
Why this matters: After implementing this organization model, I saw immediate improvements: Before:
  • Verifier would sometimes say “looks good” without checking edge cases
  • Builder occasionally added features not in the plan (“seemed useful”)
  • Unclear when to involve human vs. let agents decide
After:
  • Verifier follows checklist every time
  • Builder stays strictly within plan boundaries
  • Clear decision rights eliminate back-and-forth
The meta-insight: AI agents benefit from organizational structure just like human teams do.
Critical Learning: Without clear roles and responsibilities, AI agents optimize for “make the task work” instead of “produce the right outcome.”The organization model changed agent behavior more than prompt engineering ever could.

Documentation Strategy (The Work That Never Gets Done)

With the organization model defined, AI suggested something I didn’t ask for:
“The organization model references several documents that don’t exist yet (PRD templates, ADR process, verification report format). Should we create a documentation strategy to ensure consistency?”
My first reaction: “That’s overkill for a solo project.” My second thought: “But what if it’s not? Let AI try.”
Design a documentation strategy for the multi-agent workflow.

Context:
- Solo developer using AI agents
- Building production SaaS platform
- Need to maintain context across weeks/months
- AI agents need consistent documentation format to work effectively

Requirements:
- Documentation types (what to document)
- Documentation templates (consistent format)
- Documentation lifecycle (when to create/update)
- Documentation storage (where to keep)
- Documentation quality (how to verify)

Optimize for:
- AI agents can find and use docs effectively
- Human can understand project after 3-month break
- Minimal maintenance overhead

API Visibility Architecture (Creative + Systematic)

The problem: Need to generate different SDK versions:
  • Customer SDK (only public-facing routes)
  • Platform SDK (all routes including internal)
  • Partner SDK (partner portal routes only)
The creative part (Evaluator):
  • Design visibility tagging system
  • Choose implementation approach (compile-time vs runtime)
  • Define SDK filtering rules
The systematic part (Builder):
  • Tag 127 existing API routes with visibility
  • Update OpenAPI generation to filter by visibility
  • Create SDK generation scripts for each audience
  • Write migration guide for future routes
With AI: Creative part (2 hours with Evaluator) + Systematic part (3 hours with Builder) = 5 hours Why AI excelled: The systematic work (tagging 127 routes) would have been mind-numbing manually. Builder never gets bored, maintains perfect consistency, and actually catches edge cases I would miss.

What We Learned: The Taxonomy of AI-Suitable Work

After extensive experience, a pattern emerged: Not all work benefits equally from AI.
1. Systematic Implementation
  • Applying patterns repeatedly (tag 127 API routes)
  • Comprehensive test coverage (21 E2E scenarios)
  • Documentation from templates (35-page org model)
  • State machine implementation from spec
Why: AI never gets bored, maintains perfect consistencyROI: 5-10x speedup2. Thorough Analysis
  • Cross-reference requirements across documents
  • Identify edge cases systematically
  • Verify consistency across related components
  • Generate examples covering all scenarios
Why: AI doesn’t “skim” - it reads everything fullyROI: Catches issues humans would miss3. Structured Documentation
  • ADRs, verification reports, planning docs
  • Following templates consistently
  • Cross-linking related documents
  • Generating examples from specs
Why: AI excels at structure and completenessROI: Documentation actually gets written

Principles Established

What we learned: Work that requires consistent application of rules across many cases is perfectly suited for AI.Examples:
  • Tag 127 API routes with visibility levels
  • Write 21 E2E test scenarios following same pattern
  • Create documentation from templates for 7 different types
Rule: If work requires “do the same thing many times consistently,” delegate entirely to AI.Anti-pattern: Using AI for one-off creative tasks (AI defaults to patterns from training).
What we learned: Good documentation makes future AI work more effective.The cycle:
  1. AI creates documentation following templates
  2. Documentation captures decision context
  3. Future AI agents read documentation to understand requirements
  4. Better requirements → better implementation → better verification
Metric: Time to regain context after break dropped from 2-3 hours to 15 minutes.Rule: Invest in documentation templates once, get consistent documentation forever.
What we learned: Defining agent roles and responsibilities improves output quality more than prompt engineering.Before organization model:
  • Verifier inconsistently applied quality checks
  • Builder occasionally hallucinated features
  • Unclear escalation for conflicts
After organization model:
  • Verifier follows checklist every time
  • Builder stays within plan boundaries
  • Decision rights clearly defined
Rule: Treat AI agents like team members - give them clear roles, responsibilities, and decision rights.
What we learned: AI can create usage examples as part of implementation, not as afterthoughts.Traditional approach:
  • Implement feature (2 days)
  • “I’ll add examples later” (never happens)
AI approach:
  • Builder implements feature (4 hours)
  • Builder immediately creates example while context is fresh (1 hour)
  • Example becomes part of verification (Verifier checks example works)
Result: Every feature now has working examples because marginal cost dropped to near-zero.Rule: Make examples part of the implementation task, not a separate documentation task.

The Mistake I Made (And What It Taught Me)

After implementation: Builder finished implementing the partner cost matrix. Tests passed. Requested verification. Verifier reviewed and said: PASSED ✅ I merged to main. During integration testing: Tried to use the partner cost matrix in integration tests. It failed with a cryptic error:
Error: PartnerCostMatrix query failed: Access denied
What happened? The partner cost matrix uses OAuth scope-based authorization. Builder implemented the feature. Tests passed. But tests used a mock auth context that bypassed scope checks. Verifier missed this because: The test suite had 100% coverage of the feature logic, but 0% coverage of the authorization integration. The root cause: I didn’t specify “test authorization” in the requirements. So Builder tested business logic (correctly) but not integration with auth system. The deeper issue: As tasks become more systematic, I stopped thinking about implicit requirements. I assumed AI would “figure out” that authorization needs testing. The fix: I updated the organization model with a new checklist for Verifier:

Cross-Cutting Concerns Checklist

For every feature, verify tests cover:
  1. Authorization
    • Feature-level permission checks tested
    • Tenant isolation verified (can’t access other tenant data)
    • OAuth scope requirements documented
  2. Multi-tenancy
    • Tenant context properly scoped
    • Queries include tenant filter
    • Cross-tenant negative tests exist
  3. Event Sourcing
    • Events emitted for state changes
    • Event payload includes required fields
    • Event ordering tested
  4. Error Handling
    • Expected errors return proper status codes
    • Unexpected errors logged with context
    • Partial failure scenarios tested
  5. Observability
    • Metrics emitted for key operations
    • Logs include correlation IDs
    • Traces capture end-to-end flow
If any cross-cutting concern is untested, flag as CONDITIONAL (not FAILED). Provide specific test scenarios to add.
Lesson learned: AI excels at explicit requirements but struggles with implicit “you should know” requirements. The solution isn’t better prompts - it’s better checklists.

Metrics

Work completed:
  • Organization model: 35 pages
  • Documentation templates: 7 types
  • API visibility: 127 routes tagged
  • State machine: 7 states, 12 transitions
  • Usage metering: complete pipeline
  • Test scenarios: 21 E2E tests
Speedup: ~8-10x faster than manual Key insight: Speedup increased from previous weeks (4-6x) because work was more systematic. Value insight: Most of this work wouldn’t have been done manually due to poor effort-to-value ratio. AI changed the economics.