When AI Excels - AI Do Now

The Surprising Realization

After weeks of building with AI, I discovered something unexpected: AI doesn’t just accelerate existing workflows - it makes previously impossible work suddenly feasible. The work I tackled:

Complete end-to-end test coverage for event flows (21 test scenarios)
Organization model documentation (agent responsibilities, workflows, decision frameworks)
API visibility architecture for SDK filtering
Bundle/unbundle workflow state machine with examples
Usage metering infrastructure with aggregation pipelines

But here’s what surprised me: I wouldn’t have attempted most of this work without AI. Not because it’s technically hard, but because the effort-to-value ratio seemed wrong. Let me show you what changed.

Organization Model Documentation (The Work Nobody Does)

The surprise of the week: The most valuable output wasn’t code - it was organizational documentation. The Context: I had built significant functionality with the multi-agent workflow. But I noticed problems:

Evaluator sometimes made decisions outside its scope
Builder occasionally asked questions Evaluator should answer
Verification reports varied in quality
No clear escalation path for conflicts

The root cause: I never formally defined what each agent was responsible for. The traditional approach: Write a README explaining the workflow (maybe). What I tried instead: Create a formal organization model with AI.

The Planning Prompt

Planning session for agent organization model.

Context:
- Using 3 agents: Evaluator (Opus), Builder (Sonnet), Verifier (Sonnet)
- Working well for features, but roles blur on complex issues
- Need clear boundaries, responsibilities, and escalation paths

Design an organization model with:
1. Agent roles and responsibilities (what each agent owns)
2. Decision rights (who decides what)
3. Communication protocols (how agents interact)
4. Conflict resolution (what happens when agents disagree)
5. Quality gates (when work can progress to next agent)

Model after real organizations:
- Evaluator = Principal Architect
- Builder = Senior Engineer
- Verifier = Tech Lead / Reviewer

Include failure modes and handling strategies.

AI produced something remarkable: A 35-page organizational constitution that included:

Agent Personas

Evaluator (Principal Architect)

Owns: Architecture decisions, technical strategy
Decides: Technology choices, design patterns, trade-offs
Cannot: Implement code, override verification failures
Escalates to: Human (for business decisions)

Builder (Senior Engineer)

Owns: Implementation quality, test coverage
Decides: Code structure, algorithm choice (within plan)
Cannot: Change architecture, skip verification
Escalates to: Evaluator (for plan changes)

Verifier (Tech Lead)

Owns: Quality standards, requirement coverage
Decides: Pass/conditional/fail, required fixes
Cannot: Implement fixes, change requirements
Escalates to: Human (for requirement ambiguity)

Decision Framework

Type 1 Decisions (Reversible):

Variable naming, code formatting
Test data values
Comment wording
Owner: Builder makes decision, no approval needed

Type 2 Decisions (Reversible with effort):

Algorithm choice (within performance constraints)
Error message wording
Test organization structure
Owner: Builder proposes, Verifier validates

Type 3 Decisions (Hard to reverse):

Database schema changes
API contract changes
Event structure changes
Owner: Evaluator decides, human approves

Type 4 Decisions (Irreversible):

Multi-tenant isolation strategy
Compliance approach
Core technology choices
Owner: Human decides, Evaluator advises

Quality Gates

Gate 1: Planning Complete

✅ Requirements understood
✅ Design options evaluated
✅ Recommended approach justified
✅ Human approval received
Then: Builder can start implementation

Gate 2: Implementation Complete

✅ All plan requirements implemented
✅ Four-level tests passing
✅ No compiler warnings
✅ Builder self-review done
Then: Verifier can review

Gate 3: Verification Passed

✅ Requirements coverage verified
✅ Test adequacy confirmed
✅ Edge cases identified
✅ No blocking issues
Then: Human final review

Gate 4: Human Approved

✅ Spot-check implementation
✅ Verify AI didn’t hallucinate features
✅ Confirm alignment with business goals
Then: Merge to main

Conflict Resolution

Scenario 1: Verifier rejects implementation

Builder must fix issues (no debate)
If Builder believes rejection is wrong:
1. Builder escalates to Evaluator
2. Evaluator reviews both perspectives
3. Evaluator makes final call
4. Human can override if needed

Scenario 2: Builder disagrees with plan

Builder documents concern in PR comment
Builder proposes alternative approach
Evaluator reviews:
- If minor: Evaluator approves change
- If major: Human decides

Scenario 3: Requirement ambiguity

Verifier flags ambiguity (doesn’t fail review)
Verifier suggests clarifying questions
Human clarifies requirement
Builder updates implementation
Re-verification required

Why this matters: After implementing this organization model, I saw immediate improvements: Before:

Verifier would sometimes say “looks good” without checking edge cases
Builder occasionally added features not in the plan (“seemed useful”)
Unclear when to involve human vs. let agents decide

After:

Verifier follows checklist every time
Builder stays strictly within plan boundaries
Clear decision rights eliminate back-and-forth

The meta-insight: AI agents benefit from organizational structure just like human teams do.

Critical Learning: Without clear roles and responsibilities, AI agents optimize for “make the task work” instead of “produce the right outcome.”The organization model changed agent behavior more than prompt engineering ever could.

Documentation Strategy (The Work That Never Gets Done)

With the organization model defined, AI suggested something I didn’t ask for:

“The organization model references several documents that don’t exist yet (PRD templates, ADR process, verification report format). Should we create a documentation strategy to ensure consistency?”

My first reaction: “That’s overkill for a solo project.” My second thought: “But what if it’s not? Let AI try.”

The Prompt
AI's Strategy
Templates Created
The Surprise

Design a documentation strategy for the multi-agent workflow.

Context:
- Solo developer using AI agents
- Building production SaaS platform
- Need to maintain context across weeks/months
- AI agents need consistent documentation format to work effectively

Requirements:
- Documentation types (what to document)
- Documentation templates (consistent format)
- Documentation lifecycle (when to create/update)
- Documentation storage (where to keep)
- Documentation quality (how to verify)

Optimize for:
- AI agents can find and use docs effectively
- Human can understand project after 3-month break
- Minimal maintenance overhead

Documentation Types defined:

Architecture Decision Records (ADRs)
- When: Any Type 3 or Type 4 decision
- Format: Context, Options, Decision, Consequences
- Storage: docs/engineering/adr/
Planning Documents
- When: Start of each feature/task
- Format: Requirements, Design, Test Strategy, Risks
- Storage: .plans/ (ephemeral, can delete after merge)
Verification Reports
- When: After each verification
- Format: Requirements check, Test review, Issues, Decision
- Storage: PR comments (not separate files)
Product Requirements (PRDs)
- When: New product capability
- Format: User story, Acceptance criteria, Non-goals
- Storage: docs/engineering/products/
Agent Instructions
- When: Agent behavior needs to be defined
- Format: Role, Responsibilities, Boundaries, Examples
- Storage: .claude/agents/

Two weeks later, I discovered the real value:I took a break from the project. When I returned, I had forgotten the details of the API visibility work.Traditional scenario:

Read git log (cryptic commit messages)
Read code (what was built, but not why)
Grep for comments (scattered context)
Time to regain context: 2-3 hours

With documentation strategy:

Read ADR-0020 (why we need API visibility)
Read .plans/541-route-migration-progress.md (implementation plan)
Read verification report in PR #640 (what was validated)
Time to regain context: 15 minutes

The insight: Good documentation isn’t about explaining code - it’s about reconstructing decision context.AI made creating this documentation effortless, and now it’s paying dividends.

API Visibility Architecture (Creative + Systematic)

The problem: Need to generate different SDK versions:

Customer SDK (only public-facing routes)
Platform SDK (all routes including internal)
Partner SDK (partner portal routes only)

The creative part (Evaluator):

Design visibility tagging system
Choose implementation approach (compile-time vs runtime)
Define SDK filtering rules

The systematic part (Builder):

Tag 127 existing API routes with visibility
Update OpenAPI generation to filter by visibility
Create SDK generation scripts for each audience
Write migration guide for future routes

With AI: Creative part (2 hours with Evaluator) + Systematic part (3 hours with Builder) = 5 hours Why AI excelled: The systematic work (tagging 127 routes) would have been mind-numbing manually. Builder never gets bored, maintains perfect consistency, and actually catches edge cases I would miss.

What We Learned: The Taxonomy of AI-Suitable Work

After extensive experience, a pattern emerged: Not all work benefits equally from AI.

Where AI Excels
Where AI Struggles
The Decision Tree

1. Systematic Implementation

Applying patterns repeatedly (tag 127 API routes)
Comprehensive test coverage (21 E2E scenarios)
Documentation from templates (35-page org model)
State machine implementation from spec

Why: AI never gets bored, maintains perfect consistencyROI: 5-10x speedup2. Thorough Analysis

Cross-reference requirements across documents
Identify edge cases systematically
Verify consistency across related components
Generate examples covering all scenarios

Why: AI doesn’t “skim” - it reads everything fullyROI: Catches issues humans would miss3. Structured Documentation

ADRs, verification reports, planning docs
Following templates consistently
Cross-linking related documents
Generating examples from specs

Why: AI excels at structure and completenessROI: Documentation actually gets written

When to use AI:

Is the work...
  ├─ Systematic? (repetitive pattern)
  │  └─ YES → AI excels (Builder)
  │
  ├─ Well-specified? (clear requirements)
  │  └─ YES → AI good (Evaluator → Builder)
  │
  ├─ Needs thoroughness? (edge cases, testing)
  │  └─ YES → AI better than human (Verifier)
  │
  ├─ Creative? (novel solution needed)
  │  └─ NO → Human designs, AI implements
  │
  └─ Ambiguous? (unclear requirements)
     └─ NO → Clarify first, then use AI

Principles Established

Principle 1: AI Excels at Systematic Thoroughness

What we learned: Work that requires consistent application of rules across many cases is perfectly suited for AI.Examples:

Tag 127 API routes with visibility levels
Write 21 E2E test scenarios following same pattern
Create documentation from templates for 7 different types

Rule: If work requires “do the same thing many times consistently,” delegate entirely to AI.Anti-pattern: Using AI for one-off creative tasks (AI defaults to patterns from training).

Principle 2: Documentation Quality Pays Compound Interest

What we learned: Good documentation makes future AI work more effective.The cycle:

AI creates documentation following templates
Documentation captures decision context
Future AI agents read documentation to understand requirements
Better requirements → better implementation → better verification

Metric: Time to regain context after break dropped from 2-3 hours to 15 minutes.Rule: Invest in documentation templates once, get consistent documentation forever.

Principle 3: Organization Models Scale AI Workflows

What we learned: Defining agent roles and responsibilities improves output quality more than prompt engineering.Before organization model:

Verifier inconsistently applied quality checks
Builder occasionally hallucinated features
Unclear escalation for conflicts

After organization model:

Verifier follows checklist every time
Builder stays within plan boundaries
Decision rights clearly defined

Rule: Treat AI agents like team members - give them clear roles, responsibilities, and decision rights.

Principle 4: Examples are Implementation Artifacts

What we learned: AI can create usage examples as part of implementation, not as afterthoughts.Traditional approach:

Implement feature (2 days)
“I’ll add examples later” (never happens)

AI approach:

Builder implements feature (4 hours)
Builder immediately creates example while context is fresh (1 hour)
Example becomes part of verification (Verifier checks example works)

Result: Every feature now has working examples because marginal cost dropped to near-zero.Rule: Make examples part of the implementation task, not a separate documentation task.

The Mistake I Made (And What It Taught Me)

After implementation: Builder finished implementing the partner cost matrix. Tests passed. Requested verification. Verifier reviewed and said: PASSED ✅ I merged to main. During integration testing: Tried to use the partner cost matrix in integration tests. It failed with a cryptic error:

Error: PartnerCostMatrix query failed: Access denied

What happened? The partner cost matrix uses OAuth scope-based authorization. Builder implemented the feature. Tests passed. But tests used a mock auth context that bypassed scope checks. Verifier missed this because: The test suite had 100% coverage of the feature logic, but 0% coverage of the authorization integration. The root cause: I didn’t specify “test authorization” in the requirements. So Builder tested business logic (correctly) but not integration with auth system. The deeper issue: As tasks become more systematic, I stopped thinking about implicit requirements. I assumed AI would “figure out” that authorization needs testing. The fix: I updated the organization model with a new checklist for Verifier:

Updated Verification Checklist: Integration Requirements

Cross-Cutting Concerns Checklist

For every feature, verify tests cover:

Authorization
- Feature-level permission checks tested
- Tenant isolation verified (can’t access other tenant data)
- OAuth scope requirements documented
Multi-tenancy
- Tenant context properly scoped
- Queries include tenant filter
- Cross-tenant negative tests exist
Event Sourcing
- Events emitted for state changes
- Event payload includes required fields
- Event ordering tested
Error Handling
- Expected errors return proper status codes
- Unexpected errors logged with context
- Partial failure scenarios tested
Observability
- Metrics emitted for key operations
- Logs include correlation IDs
- Traces capture end-to-end flow

If any cross-cutting concern is untested, flag as CONDITIONAL (not FAILED). Provide specific test scenarios to add.

Lesson learned: AI excels at explicit requirements but struggles with implicit “you should know” requirements. The solution isn’t better prompts - it’s better checklists.

Metrics

Work completed:

Organization model: 35 pages
Documentation templates: 7 types
API visibility: 127 routes tagged
State machine: 7 states, 12 transitions
Usage metering: complete pipeline
Test scenarios: 21 E2E tests

Speedup: ~8-10x faster than manual Key insight: Speedup increased from previous weeks (4-6x) because work was more systematic. Value insight: Most of this work wouldn’t have been done manually due to poor effort-to-value ratio. AI changed the economics.

Insights

​The Surprising Realization

​Organization Model Documentation (The Work Nobody Does)

Agent Personas

Decision Framework

Quality Gates

Conflict Resolution

​Documentation Strategy (The Work That Never Gets Done)

​API Visibility Architecture (Creative + Systematic)

​What We Learned: The Taxonomy of AI-Suitable Work

​Principles Established

​The Mistake I Made (And What It Taught Me)

​Cross-Cutting Concerns Checklist

​Metrics

The Surprising Realization

Organization Model Documentation (The Work Nobody Does)

Documentation Strategy (The Work That Never Gets Done)

API Visibility Architecture (Creative + Systematic)

What We Learned: The Taxonomy of AI-Suitable Work

Principles Established

The Mistake I Made (And What It Taught Me)

Cross-Cutting Concerns Checklist

Metrics