The Code AI Can't Write (Yet)

The Honest Assessment

After building a production SaaS platform with AI agents for several months, I’ve shipped over 2,500 lines of infrastructure code weekly, eliminated 4,700 lines of boilerplate through macros, and maintained a 99.2% test pass rate. AI absolutely excels at certain types of work. But there are clear, consistent boundaries where AI fails. Not “struggles” or “needs help” - actively fails and makes things worse. Understanding these boundaries isn’t AI criticism. It’s boundary exploration that makes you better at using AI effectively. The insight: AI’s failures cluster into distinct categories. Learn to recognize them, and you’ll know when to step in before AI burns 24 hours on a problem you could solve in 90 minutes.

The Seven Failure Categories

1. Novel Problems (No Pattern to Match)

What fails: Problems AI has never seen in training data. Real example: Designing a scope-based AWS client factory that enforces multi-tenant isolation boundaries through the type system.

The Challenge
What AI Missed
Human Solution

Week 6: Build AWS client factory that prevents cross-tenant data access at compile time.Requirements:

4 operational scopes (platform, tenant, capsule, operator)
Automatic table name prefixing per scope
Type-safe enforcement of isolation

AI’s initial proposal: Generic factory with runtime checks.Problem: Runtime checks can be bypassed. Needed compile-time enforcement.

AI suggested standard factory pattern from its training:

// AI's approach: Runtime validation
fn get_client(&self, scope: Scope) -> Client {
    if scope.is_valid() {  // Runtime check
        Client::new(scope)
    } else {
        panic!("Invalid scope")
    }
}

Issue: Nothing prevents calling with wrong scope type.

Separate client types per scope:

// Compile-time enforcement
impl CapsuleClient {
    fn table_name(&self, base: &str) -> String {
        format!("{}_{}", self.capsule.code, base)
    }
}

impl PlatformClient {
    fn table_name(&self, base: &str) -> String {
        base.to_string()  // No prefix
    }
}

Result: Type system prevents wrong client type. Compiler catches errors.

Why AI fails: Novel type-level enforcement patterns aren’t in training data. AI defaults to runtime checks. When to intervene: When requirements include “enforce at compile time” or “prevent X architecturally.”

2. Framework Quirks (Middleware Ordering, Lifecycle)

What fails: Framework-specific execution order, lifecycle hooks, initialization sequences. Real example: Week 7’s middleware ordering bug.

The Bug: ConfigMiddleware registered BEFORE CapsuleExtractor.Why it failed: In Actix-web, middleware wraps execute in reverse order. ConfigMiddleware ran first but needed CapsuleContext that CapsuleExtractor provides.AI’s mistake: Designed both middleware correctly but didn’t understand Actix-web’s wrap-in-reverse semantics.

// ❌ AI's initial code (wrong order)
App::new()
    .wrap(ConfigMiddleware::new(config_service))  // Runs first, needs capsule
    .wrap(CapsuleExtractor::new())                // Runs second, provides capsule

// ✅ Human fix
App::new()
    .wrap(CapsuleExtractor::new())                // Runs first
    .wrap(ConfigMiddleware::new(config_service))  // Runs second, has capsule

How we caught it: Middleware returned 500 error “CapsuleContext not found.” Took 15 minutes to diagnose, 2 minutes to fix.

Why AI fails: Training data shows middleware registration, not execution order semantics. Framework docs don’t always make this explicit. When to intervene: Any framework feature with hidden ordering dependencies (middleware, hooks, lifecycle).

3. Performance Optimization (Needs Profiling Data)

What fails: Choosing what to optimize without actual performance measurements. Real example: Week 6’s credential caching.

What AI Built
The Performance Issue
Human Addition

AWS client factory with perfect architecture:

Type-safe scope enforcement
Automatic table prefixing
Clean API design

What AI missed: Credential caching.

For cross-account operations, AI used direct STS assume-role calls:

// AI's code: Assume role on every request
pub async fn operator_client(&self) -> Client {
    let sts = self.sts_client();
    let creds = sts.assume_role()  // 200-500ms latency
        .role_arn(&self.role_arn)
        .send()
        .await?;

    Client::new_with_credentials(creds)
}

Impact: 200-500ms added to every cross-account operation.

Added Moka cache with 55-minute TTL (AWS sessions last 60 minutes):

// Human optimization: Cache credentials
pub async fn operator_client(&self) -> Client {
    if let Some(creds) = self.cache.get(&self.role_arn) {
        return Ok(Client::new_with_credentials(creds));
    }

    let creds = self.sts_client()
        .assume_role()
        .send()
        .await?;

    self.cache.insert(self.role_arn.clone(), creds.clone());
    Ok(Client::new_with_credentials(creds))
}

Result: Latency dropped from 250ms to 2ms on cache hits (98% hit rate).

Why AI fails: AI doesn’t know that STS assume-role adds 200-500ms. That’s operational knowledge from running production systems. When to intervene: Any code path that calls external services repeatedly. Add caching before AI implementation.

4. Security Hardening (Requires Threat Modeling)

What fails: Identifying attack vectors, validating security boundaries, preventing subtle bypasses. Real example: Multi-tenant isolation validation (implied from Week 6 work). What AI builds: Working isolation logic per requirements. What AI misses:

Cross-tenant queries via timestamp-based GSI (leaked data)
Missing tenant validation in batch operations
Race conditions in tenant-scoped locks
Token substitution attacks (swap tenant_id in JWT)

Why AI fails: Security requires adversarial thinking. “What if attacker does X?” AI optimizes for happy path. When to intervene: Security-critical code paths. Add threat model before AI implements.

5. Operational Concerns (Caching, Monitoring, Scaling)

What fails: Production deployment concerns that don’t appear in development. Real examples from the journey:

Caching Strategy

AI missed: DynamoDB 400KB item size limitImpact: Event sourcing worked in dev (small events), failed in prod (aggregated events > 400KB)Human fix: Added event compression + splitting logic

Circuit Breakers

AI missed: DynamoDB throttling protectionImpact: Bulk operations exhausted write capacityHuman fix: Added exponential backoff + batch size limits

Observability

AI missed: Structured logging with correlation IDsImpact: Couldn’t trace requests across servicesHuman fix: Added tracing middleware with request_id propagation

Graceful Degradation

AI missed: Fallback when config service unavailableImpact: All requests failed if config DynamoDB downHuman fix: Added platform-level defaults + circuit breaker

Why AI fails: Production concerns come from running systems at scale. AI trained on code, not deployment war stories. When to intervene: Before deploying to production. Review for caching, monitoring, error handling, scaling.

6. Business Logic (Domain-Specific Rules)

What fails: Subtle business rules that aren’t explicitly documented. Example: Billing calculation edge cases. What AI implements: Straightforward billing from requirements:

Charge per API call
Monthly aggregation
Pro-rated refunds

What AI misses:

Don’t charge for failed requests (500 errors)
Don’t charge during maintenance windows
Don’t charge for health checks
Cap monthly charge at contract limit
Handle timezone boundaries for “monthly”

Why AI fails: These rules live in domain expert’s heads, not requirements docs. AI can’t infer implicit business knowledge. When to intervene: Domain logic with regulatory, financial, or compliance implications. Verify with domain experts.

7. Edge Cases (Requires Deep Context)

What fails: Rare but critical scenarios that break assumptions. Real example from Week 5: The cascading errors.

The Cascade
What AI Missed
Human Recovery

Change: Updated macro to generate CRUD methods automaticallyExpected impact: Update 4 crates, maybe 20 call sitesActual impact: 214 compilation errors across 30 filesAI’s approach: Fix one error at a timeResult: 30 commits over 24 hours, still had errors

Edge cases AI didn’t catch:

Method name collision: Generated save() method conflicted with hand-written save() in 12 implementations
Error type changes: New macro returned EventStoreError but call sites expected RepositoryError
Factory pattern mismatch: Macro expected client() method, existing code had client field
Dependency chain: Fixing one file broke imports in 3 dependent files

AI fixed each error individually, creating 2-3 new errors per fix.

Why AI fails: Cascading changes require understanding system-wide dependencies. AI optimizes locally (fix this error) not globally (understand the change pattern). When to intervene: When error count stops decreasing or AI makes “partial fix” commits. Sign of being stuck.

Real Examples: What AI Did vs. What AI Missed

Week 6: AWS Client Factory

✅ AI Wrote Successfully
❌ AI Missed (Human Added)
The Pattern

Event sourcing pattern:

Complete implementation of event store
Aggregate root pattern
Event versioning
Idempotency keys
2,100 lines of working code

AWS client factory:

Type-safe scope enforcement
Platform/Tenant/Capsule/Operator clients
Automatic table name prefixing
2,364 lines with 39 tests

Config middleware:

Hierarchical resolution (Platform → Tenant → Capsule)
Automatic injection via middleware
REST API with preview endpoints
5,541 lines with 28 tests

Why These Boundaries Exist

1. Training Data Limitations

AI learns from public code repositories. What’s missing:

Production deployment configs: Not in repos (secrets, scaling params)
Incident post-mortems: Private docs, not public code
Performance profiling results: Runtime data, not source code
Security threat models: Confidential, not open-source
Business domain knowledge: In people’s heads, not documentation

Implication: AI designs clean patterns but misses operational reality.

2. Context Window Limits

Even with 200K token context: What fits:

Single crate implementation
Related test files
Architecture docs

What doesn’t fit:

Entire workspace (9 crates, 180 files)
Cross-crate dependency chain
Historical evolution (why code changed)

Result: AI sees local correctness, misses global impact (Week 5’s cascade).

3. Operational Knowledge Gap

AI knows code patterns but not production behavior:

AWS STS assume-role latency
DynamoDB 400KB item size limit
EventBridge PutEvents throttling
CORS preflight optimization
HTTP/2 connection pooling

These come from running production systems, not reading code.

What This Means for Workflows

Where Humans Add Value

Design Review (Before Implementation)

Human role: Review AI’s proposed architecture for gapsQuestions to ask:

Performance: What needs caching?
Security: What are the attack vectors?
Operational: How does this fail? How do we debug it?
Limits: What happens at scale?

Example: Week 6’s credential caching caught in design reviewSaved: Launching without caching, discovering 200ms latency in production

Framework Knowledge (During Implementation)

Human role: Catch framework-specific quirks AI missesWatch for:

Middleware execution order
Lifecycle hook timing
Dependency injection scope
Transaction boundaries

Example: Week 7’s middleware ordering bugFix time: 15 minutes (caught in testing)

Production Hardening (After Implementation)

Human role: Add operational concerns before deploymentChecklist:

Monitoring: Metrics, logs, traces
Error handling: Retries, circuit breakers
Performance: Caching, connection pooling
Limits: Rate limiting, batch sizes

Example: Added DynamoDB throttling protection after Week 6Result: No production incidents from write capacity exhaustion

Incident Response (When Things Break)

Human role: Debug production issues with full contextWhy AI struggles:

Needs correlation across logs, metrics, traces
Requires understanding of deployed versions
Must reason about race conditions, timing

Example: Production 500 error from middleware orderingAI diagnosis: Suggested 10 potential causes Human diagnosis: Checked Actix-web execution order → fixed in 2 minutes

The Decision Tree: When to Use AI vs. Intervene

New task arrives:

Is it a novel problem AI hasn't seen?
├─ YES → Human designs, AI implements
└─ NO → Continue

Does it involve framework-specific quirks?
├─ YES → AI implements, human reviews framework behavior
└─ NO → Continue

Does it need performance optimization?
├─ YES → Human profiles first, then AI optimizes hotspots
└─ NO → Continue

Is it security-critical?
├─ YES → Human threat models, AI implements controls
└─ NO → Continue

Does it involve operational concerns?
├─ YES → AI builds infrastructure, human adds monitoring/caching
└─ NO → AI can handle end-to-end

Is it domain-heavy business logic?
├─ YES → Human validates with domain expert, AI implements
└─ NO → Continue

Are there cascading errors (>10)?
├─ YES → Manual intervention (batch fix)
└─ NO → Let AI fix incrementally

Principles for Working Within Boundaries

Design Before Build

Lesson from Week 6: ADR-driven design worked. Week 5’s reactive fixing failed.Practice:

Document constraints (ADR, requirements)
Let AI propose architecture
Human reviews for gaps (caching, security)
Implement with confidence

Result: 2,364 lines in 3 days, 0 production bugs

Atomic Changes

Lesson from Week 5’s cascade: 30 commits of partial fixes created more errors.Practice:

Migrate one component completely
Test thoroughly
Then migrate next component
Never commit broken intermediate states

Result: Week 6’s 3-crate migration, zero cascading errors

Add Operational Layer

Lesson from production: AI builds clean patterns, misses caching/monitoring.Practice:

Review AI implementation for external calls
Add caching before deployment
Add metrics, tracing, structured logging
Add circuit breakers for downstream services

Result: 99.2% cache hit rate, 42ms latency improvement

Monitor AI Progress

Lesson from Week 5: Error count stopped decreasing = AI stuck.Practice: Track errors fixed per commit:

Healthy: 5-10 errors per commit
Warning: 2-4 errors per commit
Critical: less than 2 errors per commit

Action: If critical for 3 commits → manual intervention

The Meta-Insight

After several months of building with AI: AI isn’t “almost there” on these seven categories. They’re fundamental boundaries:

Novel problems: Training data limitation
Framework quirks: Hidden execution order
Performance: No profiling data
Security: Requires adversarial thinking
Operational: Production experience needed
Business logic: Domain knowledge gap
Edge cases: Global context required

These aren’t bugs to fix. They’re boundaries to work within. The workflow shift:

❌ “Let AI do everything, fix what breaks”
✅ “Use AI where it excels, human where it doesn’t”

Recognizing the boundaries makes you 10x more effective with AI.

Actionable Takeaways

If you’re building with AI:

Document constraints before asking AI to design - ADRs, requirements docs, isolation rules. Let AI design within boundaries.
Review for the seven gaps - Performance (add caching), security (threat model), operational (monitoring), framework (quirks), novel patterns (human designs first).
Watch for cascade signals - Diminishing returns (errors per commit dropping), “partial fix” in commit messages, AI making same fix repeatedly.
Add operational layer before shipping - Metrics, tracing, circuit breakers, caching. AI builds infrastructure, humans harden for production.
Design prevents debugging - Week 6’s proactive design: 3 days, 0 bugs. Week 5’s reactive fixing: 24 hours, still broken. Design wins.

Pro tip: Create a production readiness checklist for AI-generated code:

Performance: What needs caching?
Security: What’s the threat model?
Monitoring: Metrics, logs, traces added?
Error handling: Circuit breakers, retries?
Limits: Rate limits, batch sizes configured?
Framework: Execution order correct?

Run this before deploying AI implementations. Catches 90% of gaps.

Discussion

Share Your Experience

What limitations have you hit with AI-assisted development? Which of these seven categories matches your experience?Connect on LinkedIn or comment on YouTube

Disclaimer: This content represents personal learning from building with AI on a personal project. It does not represent my employer’s views, technologies, or approaches.All code examples are generic patterns for educational purposes.

Practical Guides

Insights & Debate

The Code AI Can't Write (Yet)

The Honest Assessment

The Seven Failure Categories

1. Novel Problems (No Pattern to Match)

2. Framework Quirks (Middleware Ordering, Lifecycle)

3. Performance Optimization (Needs Profiling Data)

4. Security Hardening (Requires Threat Modeling)

5. Operational Concerns (Caching, Monitoring, Scaling)

Caching Strategy

Circuit Breakers

Observability

Graceful Degradation

6. Business Logic (Domain-Specific Rules)

7. Edge Cases (Requires Deep Context)

Real Examples: What AI Did vs. What AI Missed

Week 6: AWS Client Factory

Why These Boundaries Exist

1. Training Data Limitations

2. Context Window Limits

3. Operational Knowledge Gap

What This Means for Workflows

Where Humans Add Value

The Decision Tree: When to Use AI vs. Intervene

Principles for Working Within Boundaries

Design Before Build

Atomic Changes

Add Operational Layer

Monitor AI Progress

The Meta-Insight

Actionable Takeaways

Discussion

Share Your Experience

Practical Guides

Insights & Debate

​The Honest Assessment

​The Seven Failure Categories

​1. Novel Problems (No Pattern to Match)

​2. Framework Quirks (Middleware Ordering, Lifecycle)

​3. Performance Optimization (Needs Profiling Data)

​4. Security Hardening (Requires Threat Modeling)

​5. Operational Concerns (Caching, Monitoring, Scaling)

Caching Strategy

Circuit Breakers

Observability

Graceful Degradation

​6. Business Logic (Domain-Specific Rules)

​7. Edge Cases (Requires Deep Context)

​Real Examples: What AI Did vs. What AI Missed

​Week 6: AWS Client Factory

​Why These Boundaries Exist

​1. Training Data Limitations

​2. Context Window Limits

​3. Operational Knowledge Gap

​What This Means for Workflows

​Where Humans Add Value

​The Decision Tree: When to Use AI vs. Intervene

​Principles for Working Within Boundaries

Design Before Build

Atomic Changes

Add Operational Layer

Monitor AI Progress

​The Meta-Insight

​Actionable Takeaways

​Discussion

Share Your Experience

The Honest Assessment

The Seven Failure Categories

1. Novel Problems (No Pattern to Match)

2. Framework Quirks (Middleware Ordering, Lifecycle)

3. Performance Optimization (Needs Profiling Data)

4. Security Hardening (Requires Threat Modeling)

5. Operational Concerns (Caching, Monitoring, Scaling)

6. Business Logic (Domain-Specific Rules)

7. Edge Cases (Requires Deep Context)

Real Examples: What AI Did vs. What AI Missed

Week 6: AWS Client Factory

Why These Boundaries Exist

1. Training Data Limitations

2. Context Window Limits

3. Operational Knowledge Gap

What This Means for Workflows

Where Humans Add Value

The Decision Tree: When to Use AI vs. Intervene

Principles for Working Within Boundaries

The Meta-Insight

Actionable Takeaways

Discussion