Skip to main content

The AI Code Review Blind Spots

The Merge That Shouldn’t Have Passed

The PR had 700+ compilation errors when it merged. How? The AI code reviewer said it looked good. The human reviewer trusted the AI. Cost: $454 and 2 hours with main branch broken. This is the story of 8 weeks discovering exactly what AI code review misses—and why it matters. Over 8 weeks of AI-assisted platform development, I tracked every bug, every miss, every cascading failure. The results revealed a stark divide: AI accelerates proactive design work 5-7x, but slows reactive debugging by 16x. The difference comes down to seven categories of blind spots that consistently slip past AI code review. This article documents those blind spots with real examples, real metrics, and a framework for knowing when to intervene.

The Seven Categories

Category 1: Security Blind Spots

The Problem: AI misses security issues that require adversarial thinking. Evidence from Platform Development:
  • 217 AWS SDK calls bypassing tenant isolation checks
  • Wildcard IAM policies (Resource: "*") defeating permission boundaries
  • Cross-capsule queries via Global Secondary Index leaking data between tenants
Real Example:
# ❌ AI's code: Catch-all policy defeats the deny rule
PolicyDocument:
  Statement:
    - Sid: DenyDangerousActions
      Effect: Deny
      Action: 
        - iam:*
        - organizations:*
      Resource: "*"
    
    - Sid: AllowOtherActions  # DEFEATS THE PURPOSE!
      Effect: Allow
      Action: "*"
      Resource: "*"
The AI generated a security boundary with a gaping hole. The deny statement looks good in isolation. But the subsequent allow-all statement renders it meaningless—any action can proceed via the second statement. Why AI Missed It: Security requires thinking like an attacker. AI optimizes for the happy path, not adversarial scenarios. It saw “deny dangerous actions” and considered the security requirement met. Another Example:
// ❌ AI's code: Direct AWS SDK call bypassing tenant isolation
async fn get_item(table: &str, key: HashMap<String, AttributeValue>) 
    -> Result<GetItemOutput> {
    dynamodb_client
        .get_item()
        .table_name(table)
        .set_key(Some(key))
        .send()
        .await
}
This function accesses DynamoDB directly without tenant context. In a multi-tenant system, this is a critical vulnerability. But AI sees correct AWS SDK usage and approves. The Fix Required:
// ✅ Enforces tenant isolation
async fn get_item(
    tenant_id: &str,
    table: &str, 
    key: HashMap<String, AttributeValue>
) -> Result<GetItemOutput> {
    let config = get_tenant_config(tenant_id).await?;
    config.dynamodb_client  // Tenant-scoped client
        .get_item()
        .table_name(&config.prefix_table(table))
        .set_key(Some(key))
        .send()
        .await
}
Impact: 217 functions needed manual security review and retrofitting. Cost: 3 days of architectural work to establish tenant-scoped AWS clients.

Category 2: Performance Blind Spots

The Problem: AI lacks profiling data and production context. Evidence from Platform:
  • Missing STS credential caching: 200-500ms added to every request
  • Config service hammering DynamoDB: 17,000 reads per 1,000 requests
  • scan() operations: Costing $200-500/month instead of query()
Real Example:
// ❌ AI's code: No caching, 200-500ms per call
async fn assume_role(role_arn: &str) -> Result<Credentials> {
    sts_client
        .assume_role()
        .role_arn(role_arn)
        .role_session_name("session")
        .send()
        .await?
        .credentials
}
Every request made an STS call. AI saw correct AWS SDK usage. It didn’t know STS calls are slow or that credentials are cacheable for up to 1 hour. The Fix:
// ✅ Cache credentials with TTL
lazy_static! {
    static ref CRED_CACHE: Mutex<LruCache<String, (Credentials, Instant)>> 
        = Mutex::new(LruCache::new(100));
}

async fn assume_role(role_arn: &str) -> Result<Credentials> {
    let mut cache = CRED_CACHE.lock().unwrap();
    
    if let Some((creds, timestamp)) = cache.get(role_arn) {
        if timestamp.elapsed() < Duration::from_secs(3000) {  // 50min
            return Ok(creds.clone());
        }
    }
    
    let creds = sts_client.assume_role()...  // Make call
    cache.put(role_arn.to_string(), (creds.clone(), Instant::now()));
    Ok(creds)
}
Impact: Request latency dropped from 800ms to 200ms. AI wouldn’t have caught this without explicit performance requirements and profiling data.

Category 3: Architecture Blind Spots

The Problem: AI misses framework-specific execution quirks. Evidence from Platform: Example 1: Actix-web Middleware Order
// ❌ AI's code: Logical order, wrong execution
App::new()
    .wrap(AuthMiddleware)      // Expects tenant_id in request
    .wrap(TenantMiddleware)    // Sets tenant_id
    .wrap(LoggingMiddleware)
This looks correct: log first, identify tenant, then authenticate. But Actix-web wraps execute in REVERSE order. So AuthMiddleware runs before TenantMiddleware sets the tenant_id, causing authentication to fail. The Fix:
// ✅ Reverse order for Actix-web's execution model
App::new()
    .wrap(LoggingMiddleware)
    .wrap(TenantMiddleware)    // Runs second
    .wrap(AuthMiddleware)      // Runs first
Example 2: Git Hook Unreachable Code
# ❌ AI's pre-commit hook
#!/bin/bash
cargo fmt --check
if [ $? -ne 0 ]; then
    exit 0  # Exit here if formatting fails
fi

cargo clippy -- -D warnings  # NEVER RUNS!
AI generated a hook that exits on formatting failures, skipping linting entirely. The exit statement short-circuits execution. Why AI Missed It: Training data shows common patterns. Framework quirks and execution-order edge cases are underrepresented. AI applies logical ordering without understanding runtime behavior.

Category 4: Cascading Errors (The Week 5 Disaster)

The Problem: AI handles cascading errors by fixing in isolation, creating whack-a-mole cycles. What Happened:
  • A macro signature changed, affecting 47+ call sites
  • AI approach: 31 commits, 24 hours, 63 errors remaining
  • Manual intervention: 3 commits, 90 minutes, 0 errors
  • 16x penalty for reactive AI debugging
Why AI Failed: The AI fixed errors one at a time without understanding the holistic pattern:
Commit 1: Fix 12 errors → Introduce 8 new errors (4 net)
Commit 2: Fix 4 errors → Introduce 3 new errors (1 net)
Commit 3: Fix 1 error → Introduce 1 new error (0 net)
Diminishing returns. Each fix broke 2-3 new call sites because AI lacked the global context to apply a consistent pattern. Manual Approach:
// Step 1: Understand the pattern change
// Old: config_value!(key)
// New: config_value!(config, key)

// Step 2: Batch replace with regex
// Found 47 instances, replaced all at once

// Step 3: Handle special cases (3 instances)
Result: 3 commits, 0 compilation errors. The Discovery:
AI for proactive design (Weeks 6-7): 5-7x faster
AI for reactive debugging (Week 5): 16x slower

Strategic implication: Prevention via design is 
80-100x more efficient than reactive debugging

Category 5: Cross-Entity Consistency

The Problem: AI doesn’t maintain consistency across 54 entities spanning 12 files. Evidence from Platform:
  • Foreign key type mismatches: UUID in one entity, String in another
  • Missing table registrations: Entity defined but not registered in schema
  • Inconsistent field names: user_id in one entity, userId in another
Real Example:
// Entity A
#[derive(Entity)]
struct Organization {
    #[primary_key]
    id: Uuid,  // UUID type
}

// Entity B (generated later)
#[derive(Entity)]
struct User {
    org_id: String,  // ❌ Should be Uuid
}
AI generated Entity B months after Entity A. It didn’t reference the original schema. Result: runtime type conversion errors. Why AI Missed It: Each entity generation was a separate context window. AI doesn’t maintain long-term consistency without explicit cross-referencing. Impact: Manual audit of all 54 entities required, finding 12 inconsistencies.

Category 6: Edge Cases

The Problem: AI optimizes for the happy path, missing edge cases documented in footnotes. Evidence from Platform: Example 1: Missing GSI Projection Type
// ❌ AI's code: Works for simple queries
GlobalSecondaryIndex::builder()
    .index_name("tenant-index")
    .key_schema(key_schema)
    .build()  // Missing projection!
This works until you query non-key attributes, then DynamoDB returns incomplete data. The fix requires specifying projection type:
// ✅ Explicit projection
GlobalSecondaryIndex::builder()
    .index_name("tenant-index")
    .key_schema(key_schema)
    .projection(
        Projection::builder()
            .projection_type(ProjectionType::All)
            .build()
    )
    .build()
Example 2: PII Attribute on Non-String Fields
// ❌ AI's code: Applies PII to numbers
#[derive(Entity)]
struct User {
    #[pii]
    age: i32,  // Can't encrypt numbers!
}
The PII macro expects String fields for encryption. Applying it to integers causes serialization failures. Why AI Missed It: Edge cases appear in documentation footnotes, not primary examples. AI trains on common patterns, not rare exceptions.

Category 7: Operational Blind Spots

The Problem: AI lacks production operational experience. Evidence from Platform:
  • No circuit breakers for retry logic (cascading failures)
  • No correlation IDs for distributed tracing
  • No structured logging with context (debugging nightmares)
Real Example:
// ❌ AI's retry logic: No circuit breaker
async fn call_service(url: &str) -> Result<Response> {
    let mut retries = 0;
    loop {
        match http_client.get(url).send().await {
            Ok(resp) => return Ok(resp),
            Err(_) if retries < 5 => {
                retries += 1;
                tokio::time::sleep(Duration::from_secs(1)).await;
            }
            Err(e) => return Err(e),
        }
    }
}
This retry logic will hammer a failing service, potentially causing cascading failures. No circuit breaker, no backoff strategy. The Fix:
// ✅ Circuit breaker pattern
lazy_static! {
    static ref CIRCUIT: Mutex<CircuitBreaker> = 
        Mutex::new(CircuitBreaker::new(5, Duration::from_secs(60)));
}

async fn call_service(url: &str) -> Result<Response> {
    let mut circuit = CIRCUIT.lock().unwrap();
    
    if circuit.is_open() {
        return Err(Error::CircuitOpen);
    }
    
    match http_client.get(url).send().await {
        Ok(resp) => {
            circuit.record_success();
            Ok(resp)
        }
        Err(e) => {
            circuit.record_failure();
            Err(e)
        }
    }
}
Why AI Missed It: Production operational patterns require experience with system failures. AI training emphasizes functionality over operational resilience.

Why AI Misses These: Fundamental Boundaries

These aren’t bugs to fix—they’re fundamental limitations of AI code review:
  1. Novel problems: No training data patterns exist yet
  2. Framework quirks: Hidden execution order (Actix middleware)
  3. Performance: No profiling data in context window
  4. Security: Requires adversarial thinking, not happy-path optimization
  5. Operational: Production failure experience needed
  6. Business logic: Domain-specific knowledge gaps
  7. Edge cases: Requires global context across 54 entities, 12 files
AI excels at pattern matching. It fails when patterns don’t exist or when understanding requires context beyond the code itself.

The Architect Review Pattern

Evidence: Human architectural review consistently caught issues AI rationalized. Commit 7920570: Architect review before merge caught 5 issues:
  • Missing tenant isolation in 3 functions
  • Performance: unnecessary DynamoDB scan
  • Missing error context for debugging
Commit 7c54906: Caught missing audit logging flag before production deployment. Verification Metrics (Week 2):
  • 18 bugs found by independent Verifier agent
  • Missing edge cases: 8 instances
  • Requirement gaps: 6 instances
  • Cross-entity inconsistencies: 4 instances
The Pattern: Fresh human review with architectural context catches what AI rationalizes as acceptable. AI sees “code compiles and tests pass.” Humans see “this violates our security model.”

When to Intervene: Decision Tree

Intervene manually when:

Error count > 10 and related to single change
→ Manual batch fix (16x faster than AI iteration)

Errors per commit < 3 for 3 consecutive commits
→ AI is stuck in diminishing returns, stop immediately

Security-critical code paths
→ Threat model first, AI implements after review

Performance optimization needed
→ Human profiles with tools, AI optimizes identified hotspots

Framework quirks (middleware, hooks, config)
→ AI implements, human reviews execution behavior

Novel problems (no established pattern)
→ Human designs solution, AI implements from spec
The Key Insight: Stop AI when it’s stuck. Week 5 showed diminishing returns: 12 → 4 → 1 error fixed per commit. That’s the signal to intervene manually.

The Prevention Framework

The 8-week research revealed a stark pattern: prevention is 80-100x more efficient than reactive debugging.

Use AI Proactively

When AI Excels (5-7x faster): Design with clear constraints: ADRs, security boundaries, performance budgets defined upfront ✅ Implementation from specifications: “Build X with constraints Y and Z” ✅ Pattern application across codebase: “Apply this security pattern to all 47 functions” ✅ Test generation: Given clear specifications and edge cases Example: Weeks 6-7 Success Task: Implement configuration service with caching, tenant isolation, and audit logging. Approach:
  1. Human defined ADR with security model, performance requirements, edge cases
  2. AI implemented from specification
  3. Human reviewed for architectural compliance
Result: 3 days vs. estimated 2 weeks manual work (5-7x speedup), 0 bugs in production.

Require Human Review

Critical Review Points: Security-critical paths: Tenant isolation, authentication, authorization ❗ Cross-service integration: Data consistency, error handling, retry logic ❗ Framework configuration: Middleware order, lifecycle hooks, build scripts ❗ Performance-sensitive code: Hot paths, database queries, caching strategies ❗ Breaking changes: API modifications, schema changes, macro signatures

ROI of Prevention

Week 5 (Reactive AI Debugging):
  • Task: Fix cascading errors from macro signature change
  • Time: 24 hours, 31 commits
  • Result: Failure (63 errors remaining, manual fix required)
Weeks 6-7 (Proactive Design):
  • Task: Build configuration service (similar complexity)
  • Time: 3 days vs. 2 weeks estimated (5-7x faster)
  • Result: Success (0 bugs in production)
The Math:
Prevention efficiency: 5-7x faster than manual
Reactive debugging: 16x slower than manual
Prevention vs. Reactive: 80-112x efficiency difference

Practical Takeaways

1. Trust, But Verify

AI code review is a tool, not a replacement for architectural thinking. The 700-error merge happened because humans trusted AI judgment. Use AI for speed, humans for critical thinking.

2. Front-Load the Constraints

The more constraints you define upfront (security model, performance budgets, edge cases), the better AI performs. Week 6-7 success came from comprehensive ADRs, not better AI.

3. Watch for Diminishing Returns

If AI fixes fewer errors with each commit, stop. That’s the signal it’s stuck. Week 5 showed 12 → 4 → 1 error fixed per commit. Manual intervention at that point saves 20+ hours.

4. Architect Review Beats AI Review

Fresh human eyes with architectural context consistently catch what AI misses:
  • Security holes (217 instances)
  • Performance issues (17,000 unnecessary DB calls)
  • Cross-entity consistency (12 mismatches)
  • Framework quirks (middleware order, hook execution)

5. Prevention Over Reaction

AI excels at proactive work (5-7x faster) and fails at reactive debugging (16x slower). The strategic implication: invest time in upfront design—ADRs, threat models, performance budgets. Then let AI implement from that foundation.

The Bottom Line

AI code review has blind spots. Seven categories of them, backed by 8 weeks of data:
  1. Security (adversarial thinking required)
  2. Performance (profiling context needed)
  3. Architecture (framework quirks)
  4. Cascading errors (holistic fixes needed)
  5. Cross-entity consistency (global context)
  6. Edge cases (footnote documentation)
  7. Operational resilience (production experience)
The divide is clear: AI accelerates proactive design 5-7x, but slows reactive debugging 16x. Use it accordingly. Front-load constraints, let AI implement, review architecturally. The 700-error merge taught us: trust the AI to code, but never trust it to think architecturally. That’s still our job.
This article documents real findings from 8 weeks of platform development. All examples are sanitized but represent actual code patterns and metrics encountered during production development.