Week 5: When AI Fails Spectacularly

This is Week 5 of “Building with AI” - a 10-week journey documenting how I use multi-agent AI workflows to build a production-grade SaaS platform.This week: The flip side of Week 4’s success story. When AI gets stuck in cascading errors, watching error counts increase while the fix-in-session approach spirals out of control. 31 commits, 24 hours, and the hard lesson that AI excels at preventing problems but struggles to fix complex cascading failures.Previous: Week 4: When AI Excels

Watch the 60-Second Summary

Week 5: Architecture mistakes and cascading errors

The Ironic Timing

Coming off Week 4’s spectacular success (107 commits, everything working beautifully), I was confident. AI had proven itself with:

End-to-end test coverage (21 scenarios)
Organization modeling
API architecture
Usage metering pipelines

I thought: “AI can handle anything now. Let’s add one more macro feature.” I was wrong. This week became a masterclass in AI’s limitations - specifically, what happens when you ask AI to debug its own cascading errors.

The Setup: A “Simple” Macro Enhancement

After successfully building 5 derive macros that eliminated 4,700 lines of boilerplate, I wanted to add one more feature: auto-generated CRUD methods. The change seemed straightforward:

// BEFORE: Macro generated struct, but methods were manual
#[derive(DynamoDbRepository)]
#[aggregate = "Account"]
pub struct DynamoDbAccountRepository;

impl DynamoDbAccountRepository {
    // Manual implementation of save, get, query, delete
    pub async fn save(&self, account: Account) -> Result<()> { /* ... */ }
    pub async fn get(&self, id: AccountId) -> Result<Option<Account>> { /* ... */ }
}

// AFTER: Macro generates methods too
#[derive(DynamoDbRepository)]
#[aggregate = "Account"]
pub struct DynamoDbAccountRepository;

// Macro now generates db_save, db_get, db_query, db_delete automatically

Expected impact: Save 200 lines per entity. Actual impact: 30 commits of cascading fixes across the crm crate. Expected time: 2 hours Actual time: 24 hours (and I had to manually finish it)

The Lesson Learned Early: Macro changes are not isolated changes. They’re multiplied changes across every usage site.I violated my own planning principle from Week 2: I didn’t plan the migration, I just made the change and asked AI to “fix the errors.”

What We Built (or Tried To)

The CRUD Macro Enhancement

Goal: Extend DynamoDbEntity macro to generate repository methods automatically Breaking changes introduced:

Method renames: save() → db_save(), get() → db_get()
Factory pattern change: self.client field → self.client() method
Error type change: RepositoryError → EventStoreError
New dependencies: aws-runtime, serde_dynamo

Affected crates: 4 (crm, catalog, auth, tenancy) Affected entities: 12 Call sites to update: 47+ Commits to implement: Should have been 2 Actual commits: 31

The Cascade: Hour by Hour

Hour 1: “This is fine”

Commit 508f43e:

feat(dynamodb-derive): add CRUD method generation to DynamoDbEntity macro

I thought: “I’ll update the 4 affected crates manually. Should take 2 hours.” Compilation output: 214 errors AI’s first fixes:

cd7c3e1: fix(crm): remove duplicate DynamoDbLegalEntityRepository export
1ef9150: fix(crm): fix syntax errors and add missing imports

Status after 1 hour: 5 commits, 214 errors → 189 errors (25 fixed) Feeling: Confident. “Good progress.”

Hour 2-3: The Pattern Emerges

Commit 23d0baf:

fix(crm): partial fix for organization_dynamodb.rs factory pattern

Wait. “Partial fix”? Why partial? What I discovered: The macro’s generated methods used a different factory pattern than the manual methods.

// Manual code expected:
let client = self.client.clone();  // self.client is a field

// Macro generated:
let client = self.client();  // self.client() is a method

The cascade begins. Every repository in crm had the same issue. Next 8 commits:

9e0815f: replace self.client with client
11fe195: add client creation to save_calendar method
b1a1ff0: batch fix organization_dynamodb.rs methods
eb5e215: fix pipeline_dynamodb client calls
a7fbb9b: fix remaining client() method calls

Status after 3 hours: 13 commits, 189 errors → 147 errors (42 fixed) Feeling: Concerned. “Why so many commits for the same pattern?”

Hour 4-6: Type System Whack-a-Mole

Each fix revealed new errors. Commit 8d6c763:

fix(crm): update pk_for_id calls to new single-parameter signature

The macro changed the primary key method signature:

Old: pk_for_id(tenant_id, capsule_id, id)
New: pk_for_id(id) (macro infers tenant/capsule from entity)

Impact: 47 call sites across 12 files needed updates. AI’s approach: Fix them one at a time, commit after each file. Why this was wrong: Each commit introduced new errors because files depend on each other. The commits:

9249626: fix pk_for_id and gsi method signatures
7503f9f: add GSI methods and fix syntax errors
47fd4ae: fix CrmError variants, field access errors
3afc5a0: move contact methods to ContactRepository impl block
3862c3e: fix E0425 errors (missing values) and self.client() calls

Status after 6 hours: 24 commits, 147 errors → 68 errors (79 fixed) Feeling: Frustrated. “We’re fixing 3 errors per commit. This is getting worse.”

The Breaking Point: When I Stopped Trusting AI

Hour 7 (next morning): I found the AI session had made 6 more commits overnight (I left it running). Latest commit (f1ed1e7):

fix(crm): fix CrmError::Repository tuple variant usage

I checked the code:

// AI's fix:
return Err(CrmError::Repository("save failed".to_string()));

// But CrmError::Repository is not a tuple variant!
// It's a struct variant:
CrmError::Repository {
    source: RepositoryError,
    entity: String,
}

AI had hallucinated the error type structure. Status: 30 commits, 68 errors → 63 errors (5 fixed overnight) At this point: I realized the fundamental problem.

The Core Issue: AI was fixing errors in isolation, not understanding the system holistically.Each commit fixed the immediate compilation error without considering:

Why did this error appear?
What upstream change caused it?
Are there related errors with the same root cause?

Result: AI was playing whack-a-mole with symptoms, not fixing the disease.

The Fix: Human Intervention

What I did:

Stopped the AI session
Read the original macro change (commit 508f43e)
Made a list of ALL changes the macro introduced:
- New method names (save → db_save)
- New factory pattern (client field → client() method)
- New error types (EventStoreError instead of RepositoryError)
- New dependency requirements (serde_dynamo)
Fixed them systematically in 3 commits instead of 30:

Commit 1: Update all method names

# Used batch find/replace for all save() → db_save() calls
rg "\.save\(" -t rust | cut -d: -f1 | sort -u | xargs sd '\.save\(' '.db_save('

Commit 2: Update factory pattern

// Updated trait definition once, all implementations inherited it
trait Repository {
    fn client(&self) -> &CapsuleClient;  // Method, not field
}

Commit 3: Update error handling

// Map old error type to new error type
impl From<RepositoryError> for CrmError {
    fn from(e: RepositoryError) -> Self {
        CrmError::Repository {
            source: e,
            entity: "unknown".to_string(),
        }
    }
}

Total time for human fix: 90 minutes AI’s time: 24 hours (and still had errors) Final comparison:

AI: 30 commits, 24 hours, 63 errors remaining
Human: 3 commits, 90 minutes, 0 errors

What Went Wrong: AI’s Fix-in-Session Anti-Pattern

The Problem

When AI encounters a compilation error, it follows this pattern:

Read the error message
Understand the immediate cause
Fix that specific error
Compile
If more errors, repeat from step 1

Why this fails for cascading errors:

Each fix addresses symptoms, not root cause
No holistic understanding of “what changed upstream”
No batching of related fixes
Creates more errors by partial fixes

The Example

Error cascade from macro change:

error[E0599]: no method named `save` found for type `DynamoDbAccountRepository`
  --> crm/src/services/account_service.rs:45:28
   |
45 |         self.repo.save(account).await?;
   |                   ^^^^ method not found

AI Fix: Rename save → db_save in account_service.rs

New Error:
error[E0599]: no method named `save` found for type `DynamoDbContactRepository`
  --> crm/src/services/contact_service.rs:67:28
   |
67 |         self.repo.save(contact).await?;
   |                   ^^^^ method not found

AI Fix: Rename save → db_save in contact_service.rs

New Error:
error[E0308]: mismatched types - expected `CrmError`, found `EventStoreError`
  --> crm/src/services/account_service.rs:45:9
   |
45 |         self.repo.db_save(account).await?;
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

AI Fix: Add error conversion in account_service.rs

New Error: ...

The pattern: AI fixed 1 error → triggered 2 new errors → repeat. What AI should have done:

Recognize pattern: “All save() calls need to be db_save()”
Fix ALL save() calls in one commit
Recognize pattern: “All error types changed”
Add unified error conversion
Compile once → done

Why AI didn’t do this: AI optimizes for “fix the current error” not “understand the change pattern.”

What I Learned: When to Stop Using AI

Red Flags That AI Is Stuck

Red Flag #1: Diminishing Returns Track errors fixed per commit:

Commits 1-5: Average 12 errors fixed each
Commits 6-15: Average 4 errors fixed each
Commits 16-30: Average 1 error fixed each (sometimes net increase!)

If errors per commit drops below 3: Stop AI, intervene manually.

Red Flag #2: Repeated Fix Patterns If you see the same type of fix across multiple commits:

“fix client() calls” (7 commits)
“fix pk_for_id signature” (5 commits)
“fix error variants” (6 commits)

This means: AI found a pattern but only fixed one instance at a time. Solution: Batch fix all instances manually.

Red Flag #3: Partial Fixes in Commit Messages Commit messages with “partial fix” or “fix remaining” indicate AI doesn’t have a complete solution. Examples from the cascade:

“partial fix for organization_dynamodb.rs factory pattern”
“fix remaining client() method calls”
“batch fix organization_dynamodb.rs methods”

When you see these: AI is guessing, not solving.

Principles Established This Week

Principle 1: Breaking Changes Need Migration Plans

What we learned: Macro changes are breaking changes that affect multiple crates simultaneously.New rule: Before making breaking macro changes:

List all affected crates
Identify all breaking changes (method renames, type changes, etc.)
Create migration checklist
Fix all affected crates in a single atomic commit
Never commit broken intermediate states

This is Week 2’s planning principle applied to refactoring.

Principle 2: Monitor AI's Error-Fixing Progress

What we learned: If error count isn’t decreasing steadily, AI is stuck.Tracking metric: Errors fixed per commit

Healthy: 5-10 errors fixed per commit
Warning: 2-4 errors fixed per commit
Critical: Less than 2 errors per commit

Action: If metric hits “warning” level for 3 consecutive commits, stop AI and intervene manually.

Principle 3: Batch Fixes Beat Incremental Fixes

What we learned: 30 small commits are worse than 1 comprehensive commit for systematic changes.When to batch:

Method rename across many files
Type signature changes
Dependency updates
Error handling pattern changes

Tools:

# Find all usage sites
rg "\.save\(" --type rust

# Batch rename
sd '\.save\(' '.db_save(' crm/**/*.rs

# Verify
cargo check

AI limitation: AI doesn’t use these batch tools effectively. Prefers file-by-file fixes.

Principle 4: AI Needs Constraints for Error Fixing

What we learned: Unconstrained “fix all errors” leads to hallucination and inefficiency.Better prompt:

Fix compilation errors in crm crate.

Constraints:
- Maximum 5 commits
- Group related fixes in single commit
- Do NOT add new methods/types
- Only update call sites to match new signatures
- If stuck after 3 commits, report and stop

This forces AI to: Think strategically about batching fixes instead of fixing one at a time.

The Surprising Discovery: AI Can Prevent This

After recovering from the cascade, I asked a fresh Evaluator session:

Review the macro change that caused 30 fix commits.

Could this have been prevented?
What checks should run before making breaking macro changes?

Evaluator’s response (impressive): Pre-Change Checklist for Macro Modifications:

Impact Analysis:
- Run cargo tree to find all crates using the macro
- Grep for all usage sites across workspace
- Estimate affected lines of code
Breaking Change Detection:
- Compare old vs new generated code (cargo expand)
- List method signature changes
- List type changes
- List new dependencies
Migration Plan:
- Create checklist of required updates per crate
- Batch fixes by category (method renames, type updates, etc.)
- Prepare workspace-wide test command
Atomic Commit Strategy:
- Update macro
- Update ALL downstream crates
- Verify workspace compilation
- Commit atomically

This checklist is now in my CLAUDE.md. The irony: AI can prevent cascading errors (via planning) but can’t fix them efficiently (via reactive debugging).

Week 5 By The Numbers

The Failure
The Recovery
The Lesson

Commits: 31 (1 feature + 30 fixes)Time invested: 24 hoursFinal state: 63 errors remainingToken usage: ~~850k tokens (~~$13)Errors fixed per commit (average): 5.1Lowest errors fixed in single commit: -2 (net increase!)Developer frustration: Maximum

AI excels at:

Isolated error fixes
Understanding individual error messages
Suggesting fix approaches
PREVENTING problems via planning

AI fails at:

Systematic refactoring
Understanding change propagation
Batching related fixes
Knowing when to stop and ask for help
FIXING cascading problems reactively

The decision rule we established:

If error count > 10 and related to single upstream change:
  → Manual batch fix (2-3 commits, tools: rg + sd)
Else if errors decreasing steadily (5+ per commit):
  → Let AI continue
Else if errors stalled (< 3 per commit for 3 commits):
  → Stop AI, analyze root cause manually

The Contrast: Week 4 vs Week 5

Week 4 (AI Excels):

AI built features from scratch
Clear requirements, no existing code to conflict with
107 commits, all green
Productivity: 6-8 weeks of work in 5.5 days

Week 5 (AI Fails):

AI fixed breaking changes it created
Complex interdependencies, cascading effects
31 commits, increasing errors
Productivity: 2 hours of work took 24 hours

The pattern:

AI for NEW work (creation) → Excellent
AI for FIXING work (reactive debugging) → Poor
AI for PREVENTING work (planning) → Excellent

Implication: Use AI proactively (planning, building), not reactively (debugging cascades).

Takeaways from Week 5

For breaking changes:

Always create migration plan first (Week 2 principle applies)
Fix all affected sites atomically (never commit broken states)
Use batch tools (rg, sd) for systematic renames
Verify with cargo check --workspace before committing

For AI collaboration:

Monitor error-fixing progress (errors per commit metric)
Intervene when progress stalls (< 3 errors per commit)
Use AI for planning/prevention, human for reactive fixes
Fresh AI sessions for post-mortem analysis (worked brilliantly)

The meta-lesson: AI is excellent at preventing problems (via thorough planning in Week 2-3 style) but poor at fixing complex cascading problems (reactive debugging). Use AI proactively, not reactively.

What’s Next

Week 6 Preview: Taking the lessons from Weeks 4 and 5, we’ll establish a hybrid workflow:

AI for feature planning and implementation
Human for migration plans and systematic refactoring
Clear decision trees for when to use which approach

The goal: Combine Week 4’s velocity with Week 5’s hard-learned wisdom.

Discuss This Week

Week 5 was humbling. Share your own “AI got stuck” stories or ask questions about the error-fixing decision tree we established.

Disclaimer: All examples are from personal projects. No proprietary code or employer-specific patterns included.

Building with AI

Autonomous Dev Org

Watch the 60-Second Summary

​The Ironic Timing

​The Setup: A “Simple” Macro Enhancement

​What We Built (or Tried To)

​The CRUD Macro Enhancement

​The Cascade: Hour by Hour

​Hour 1: “This is fine”

​Hour 2-3: The Pattern Emerges

​Hour 4-6: Type System Whack-a-Mole

​The Breaking Point: When I Stopped Trusting AI

​The Fix: Human Intervention

​What Went Wrong: AI’s Fix-in-Session Anti-Pattern

​The Problem

​The Example

​What I Learned: When to Stop Using AI

​Red Flags That AI Is Stuck

​Principles Established This Week

​The Surprising Discovery: AI Can Prevent This

​Week 5 By The Numbers

​The Contrast: Week 4 vs Week 5

​Takeaways from Week 5

​What’s Next

Discuss This Week

The Ironic Timing

The Setup: A “Simple” Macro Enhancement

What We Built (or Tried To)

The CRUD Macro Enhancement

The Cascade: Hour by Hour

Hour 1: “This is fine”

Hour 2-3: The Pattern Emerges

Hour 4-6: Type System Whack-a-Mole

The Breaking Point: When I Stopped Trusting AI

The Fix: Human Intervention

What Went Wrong: AI’s Fix-in-Session Anti-Pattern

The Problem

The Example

What I Learned: When to Stop Using AI

Red Flags That AI Is Stuck

Principles Established This Week

The Surprising Discovery: AI Can Prevent This

Week 5 By The Numbers

The Contrast: Week 4 vs Week 5

Takeaways from Week 5

What’s Next