The Surprising Realization
After weeks of building with AI, I discovered something unexpected: AI doesn’t just accelerate existing workflows - it makes previously impossible work suddenly feasible. The work I tackled:- Complete end-to-end test coverage for event flows (21 test scenarios)
- Organization model documentation (agent responsibilities, workflows, decision frameworks)
- API visibility architecture for SDK filtering
- Bundle/unbundle workflow state machine with examples
- Usage metering infrastructure with aggregation pipelines
Organization Model Documentation (The Work Nobody Does)
The surprise of the week: The most valuable output wasn’t code - it was organizational documentation. The Context: I had built significant functionality with the multi-agent workflow. But I noticed problems:- Evaluator sometimes made decisions outside its scope
- Builder occasionally asked questions Evaluator should answer
- Verification reports varied in quality
- No clear escalation path for conflicts
The Planning Prompt
The Planning Prompt
Agent Personas
Evaluator (Principal Architect)
- Owns: Architecture decisions, technical strategy
- Decides: Technology choices, design patterns, trade-offs
- Cannot: Implement code, override verification failures
- Escalates to: Human (for business decisions)
- Owns: Implementation quality, test coverage
- Decides: Code structure, algorithm choice (within plan)
- Cannot: Change architecture, skip verification
- Escalates to: Evaluator (for plan changes)
- Owns: Quality standards, requirement coverage
- Decides: Pass/conditional/fail, required fixes
- Cannot: Implement fixes, change requirements
- Escalates to: Human (for requirement ambiguity)
Decision Framework
Type 1 Decisions (Reversible):
- Variable naming, code formatting
- Test data values
- Comment wording
- Owner: Builder makes decision, no approval needed
- Algorithm choice (within performance constraints)
- Error message wording
- Test organization structure
- Owner: Builder proposes, Verifier validates
- Database schema changes
- API contract changes
- Event structure changes
- Owner: Evaluator decides, human approves
- Multi-tenant isolation strategy
- Compliance approach
- Core technology choices
- Owner: Human decides, Evaluator advises
Quality Gates
Gate 1: Planning Complete
- ✅ Requirements understood
- ✅ Design options evaluated
- ✅ Recommended approach justified
- ✅ Human approval received
- Then: Builder can start implementation
- ✅ All plan requirements implemented
- ✅ Four-level tests passing
- ✅ No compiler warnings
- ✅ Builder self-review done
- Then: Verifier can review
- ✅ Requirements coverage verified
- ✅ Test adequacy confirmed
- ✅ Edge cases identified
- ✅ No blocking issues
- Then: Human final review
- ✅ Spot-check implementation
- ✅ Verify AI didn’t hallucinate features
- ✅ Confirm alignment with business goals
- Then: Merge to main
Conflict Resolution
Scenario 1: Verifier rejects implementation
- Builder must fix issues (no debate)
- If Builder believes rejection is wrong:
- Builder escalates to Evaluator
- Evaluator reviews both perspectives
- Evaluator makes final call
- Human can override if needed
- Builder documents concern in PR comment
- Builder proposes alternative approach
- Evaluator reviews:
- If minor: Evaluator approves change
- If major: Human decides
- Verifier flags ambiguity (doesn’t fail review)
- Verifier suggests clarifying questions
- Human clarifies requirement
- Builder updates implementation
- Re-verification required
- Verifier would sometimes say “looks good” without checking edge cases
- Builder occasionally added features not in the plan (“seemed useful”)
- Unclear when to involve human vs. let agents decide
- Verifier follows checklist every time
- Builder stays strictly within plan boundaries
- Clear decision rights eliminate back-and-forth
Documentation Strategy (The Work That Never Gets Done)
With the organization model defined, AI suggested something I didn’t ask for:“The organization model references several documents that don’t exist yet (PRD templates, ADR process, verification report format). Should we create a documentation strategy to ensure consistency?”My first reaction: “That’s overkill for a solo project.” My second thought: “But what if it’s not? Let AI try.”
- The Prompt
- AI's Strategy
- Templates Created
- The Surprise
API Visibility Architecture (Creative + Systematic)
The problem: Need to generate different SDK versions:- Customer SDK (only public-facing routes)
- Platform SDK (all routes including internal)
- Partner SDK (partner portal routes only)
- Design visibility tagging system
- Choose implementation approach (compile-time vs runtime)
- Define SDK filtering rules
- Tag 127 existing API routes with visibility
- Update OpenAPI generation to filter by visibility
- Create SDK generation scripts for each audience
- Write migration guide for future routes
What We Learned: The Taxonomy of AI-Suitable Work
After extensive experience, a pattern emerged: Not all work benefits equally from AI.- Where AI Excels
- Where AI Struggles
- The Decision Tree
1. Systematic Implementation
- Applying patterns repeatedly (tag 127 API routes)
- Comprehensive test coverage (21 E2E scenarios)
- Documentation from templates (35-page org model)
- State machine implementation from spec
- Cross-reference requirements across documents
- Identify edge cases systematically
- Verify consistency across related components
- Generate examples covering all scenarios
- ADRs, verification reports, planning docs
- Following templates consistently
- Cross-linking related documents
- Generating examples from specs
Principles Established
Principle 1: AI Excels at Systematic Thoroughness
Principle 1: AI Excels at Systematic Thoroughness
What we learned: Work that requires consistent application of rules across many cases is perfectly
suited for AI.Examples:
- Tag 127 API routes with visibility levels
- Write 21 E2E test scenarios following same pattern
- Create documentation from templates for 7 different types
Principle 2: Documentation Quality Pays Compound Interest
Principle 2: Documentation Quality Pays Compound Interest
What we learned: Good documentation makes future AI work more effective.The cycle:
- AI creates documentation following templates
- Documentation captures decision context
- Future AI agents read documentation to understand requirements
- Better requirements → better implementation → better verification
Principle 3: Organization Models Scale AI Workflows
Principle 3: Organization Models Scale AI Workflows
What we learned: Defining agent roles and responsibilities improves output quality more than prompt engineering.Before organization model:
- Verifier inconsistently applied quality checks
- Builder occasionally hallucinated features
- Unclear escalation for conflicts
- Verifier follows checklist every time
- Builder stays within plan boundaries
- Decision rights clearly defined
Principle 4: Examples are Implementation Artifacts
Principle 4: Examples are Implementation Artifacts
What we learned: AI can create usage examples as part of implementation, not as afterthoughts.Traditional approach:
- Implement feature (2 days)
- “I’ll add examples later” (never happens)
- Builder implements feature (4 hours)
- Builder immediately creates example while context is fresh (1 hour)
- Example becomes part of verification (Verifier checks example works)
The Mistake I Made (And What It Taught Me)
After implementation: Builder finished implementing the partner cost matrix. Tests passed. Requested verification. Verifier reviewed and said: PASSED ✅ I merged to main. During integration testing: Tried to use the partner cost matrix in integration tests. It failed with a cryptic error:Updated Verification Checklist: Integration Requirements
Updated Verification Checklist: Integration Requirements
Cross-Cutting Concerns Checklist
For every feature, verify tests cover:-
Authorization
- Feature-level permission checks tested
- Tenant isolation verified (can’t access other tenant data)
- OAuth scope requirements documented
-
Multi-tenancy
- Tenant context properly scoped
- Queries include tenant filter
- Cross-tenant negative tests exist
-
Event Sourcing
- Events emitted for state changes
- Event payload includes required fields
- Event ordering tested
-
Error Handling
- Expected errors return proper status codes
- Unexpected errors logged with context
- Partial failure scenarios tested
-
Observability
- Metrics emitted for key operations
- Logs include correlation IDs
- Traces capture end-to-end flow
Metrics
Work completed:- Organization model: 35 pages
- Documentation templates: 7 types
- API visibility: 127 routes tagged
- State machine: 7 states, 12 transitions
- Usage metering: complete pipeline
- Test scenarios: 21 E2E tests