Skip to main content

Week 3: When AI Discovers Your Bugs Through Events

This is Week 3 of “Building with AI” - a 10-week journey documenting how I use multi-agent AI workflows to build a production-grade SaaS platform.This week: The event sourcing system we built became our debugging tool. 72 commits reveal how AI agents learned to detect architectural violations by analyzing event patterns.

The Setup: Week 2’s Hidden Debt

Previously, I thought we had completed successfully. Seven CRM domain models, custom DynamoDB macro, full API layer - all verified and merged. Initial reality check: Event logs showed something wrong.
[WARN] Capsule isolation violation detected
Entity: RoleAssignment
Event: RoleAssignmentCreated
Issue: Missing capsule_id parameter
Impact: Cross-tenant data leak risk
Wait. The verifier approved this code last week. How did a capsule isolation bug make it to main? The answer changed how I think about AI-assisted verification.

The Journey: Following the Event Trail

The Event That Revealed Everything

I opened a fresh Evaluator session to investigate:
Analyze event logs from the last 48 hours.

Context:
- Multi-tenant platform with capsule isolation
- Every entity must be scoped to tenant + capsule
- Event sourcing captures all domain operations

Task:
Find patterns in events that suggest isolation violations.
Evaluator’s analysis (2 hours): The agent didn’t just grep for “error” or “warn”. It did something more interesting - it analyzed event schemas across services.
Pattern analysis:
  1. Scanned 2,847 events across auth, CRM, and catalog services
  2. Extracted event schemas by grouping events by type
  3. Compared field patterns across related events
  4. Flagged inconsistencies where similar operations had different fields
Example finding:
AccountCreated event:
✅ Has: tenant_id, capsule_id, account_id
✅ Pattern: TENANT#{tenant}#CAPSULE#{capsule}#...

RoleAssignmentCreated event:
❌ Has: tenant_id, assignment_id
❌ Missing: capsule_id
❌ Pattern: TENANT#{tenant}#ROLE#{role}#...

INCONSISTENCY: RoleAssignment operations are not capsule-isolated
RISK LEVEL: HIGH (security isolation boundary violation)
This was fascinating. Evaluator didn’t need explicit rules about capsule isolation. It inferred the pattern from seeing other events and spotted the deviation.

The Macro Compliance Sweep

With the event analysis showing 6 entities missing capsule isolation, I realized: our DynamoDB macro from Week 2 didn’t enforce this pattern. The macro we built:
#[derive(DynamoDbEntity)]
#[table_name = "platform_data"]
#[pk = "TENANT#{tenant_id}#ACCOUNT#{id}"]
pub struct AccountEntity {
    pub tenant_id: TenantId,
    pub capsule_id: CapsuleId,  // ← Present but not enforced
    // ...
}
The problem: Macro generated correct code for entities that included capsule_id, but didn’t require it. Builder sessions for RoleAssignment and SecurityGroup entities simply… forgot to include it. Next task: Make capsule isolation mandatory at compile time.

Compile-Time Enforcement

Builder session:
Enhance DynamoDbEntity macro to enforce capsule isolation.

Requirements:
1. Every entity MUST have capsule_id field
2. Partition key pattern MUST include CAPSULE#{capsule_id}
3. Compilation fails if either is missing
4. Add helpful compiler messages

Reference: .plans/348-capsule-isolation.md
```text

**Builder's implementation (4 hours):**

```rust
// Updated macro with compile-time checks
#[proc_macro_derive(DynamoDbEntity, attributes(table_name, pk, sk, capsule_isolated))]
pub fn derive_dynamodb_entity(input: TokenStream) -> TokenStream {
    let ast = parse_macro_input!(input as DeriveInput);

    // NEW: Check for capsule_isolated attribute
    let requires_capsule = has_capsule_isolated_attr(&ast);

    if requires_capsule {
        // NEW: Validate capsule_id field exists
        if !has_field(&ast, "capsule_id") {
            return compile_error!(
                "Entity marked #[capsule_isolated] must have capsule_id field"
            );
        }

        // NEW: Validate PK pattern includes CAPSULE#
        let pk_pattern = get_pk_pattern(&ast);
        if !pk_pattern.contains("CAPSULE#{capsule_id}") {
            return compile_error!(
                "Entity marked #[capsule_isolated] must include CAPSULE#{{capsule_id}} in partition key"
            );
        }
    }

    // Generate implementation...
}
Usage:
// ✅ CORRECT - will compile
#[derive(DynamoDbEntity)]
#[capsule_isolated]
#[pk = "TENANT#{tenant_id}#CAPSULE#{capsule_id}#ROLE#{id}"]
pub struct RoleAssignmentEntity {
    pub tenant_id: TenantId,
    pub capsule_id: CapsuleId,  // Required
    pub id: RoleId,
    // ...
}

// ❌ WRONG - won't compile
#[derive(DynamoDbEntity)]
#[capsule_isolated]
#[pk = "TENANT#{tenant_id}#ROLE#{id}"]  // Missing CAPSULE#
pub struct BadEntity {
    pub tenant_id: TenantId,
    // capsule_id missing - compile error
    pub id: RoleId,
}
Verification session:
Verify capsule isolation enforcement in DynamoDB macro.

Test plan:
1. Create test entities with/without capsule_id
2. Verify compile errors for violations
3. Check error messages are helpful
4. Ensure existing entities still compile
Verifier found an edge case:
Issue: Nested Capsule ReferencesBuilder’s macro checked for “capsule_id field exists” but didn’t handle nested structs:
#[derive(DynamoDbEntity)]
#[capsule_isolated]
pub struct ComplexEntity {
    pub tenant_id: TenantId,
    pub context: EntityContext,  // Contains capsule_id internally
    pub id: EntityId,
}

pub struct EntityContext {
    pub capsule_id: CapsuleId,  // Macro doesn't see this
}
Impact: Macro would reject valid nested patterns.Fix: Enhanced field checking to traverse nested structs one level deep.
Rework cycle: 2 hours to handle nested fields, re-verify, approved.

The API Macro Surprise

With DynamoDB entities fixed, I moved to the API layer. Problem discovered: API route handlers automatically inferred path parameters but got them wrong sometimes. Example bug:
// API definition
#[api_endpoint(
    path = "/tenants/{tenant_id}/accounts/{account_id}",
    method = GET,
    permission = "crm.account.read"
)]
async fn get_account(
    tenant_id: TenantId,
    account_id: AccountId,
) -> Result<AccountResponse> {
    // Implementation...
}
What the macro generated:
// ✅ Correctly extracted path parameters: tenant_id, account_id
#[utoipa::path(
    get,
    path = "/tenants/{tenant_id}/accounts/{account_id}",
    params(
        ("tenant_id" = TenantId, Path, description = "Tenant identifier"),
        ("account_id" = AccountId, Path, description = "Account identifier"),
    ),
    // ...
)]
Seemed fine. But then Builder implemented a nested resource route:
#[api_endpoint(
    path = "/accounts/{account_id}/contacts/{contact_id}/activities",
    method = GET,
    permission = "crm.activity.read"
)]
async fn list_contact_activities(
    account_id: AccountId,
    contact_id: ContactId,
    query: ActivityQuery,  // Query parameter, NOT path parameter
) -> Result<Vec<ActivityResponse>> {
    // ...
}
Macro generated:
// ❌ WRONG - included query in path params
params(
    ("account_id" = AccountId, Path, ...),
    ("contact_id" = ContactId, Path, ...),
    ("query" = ActivityQuery, Path, ...),  // Should be Query, not Path
)
Why did this happen? The macro tried to be “helpful” by inferring: “If parameter name matches a word in the path, it’s a path param. Otherwise, it’s also a path param.” Wrong heuristic.

The Fix: Explicit Over Clever

Builder’s revised approach:
// Remove automatic inference entirely
#[api_endpoint(
    path = "/accounts/{account_id}/contacts/{contact_id}/activities",
    method = GET,
    permission = "crm.activity.read",
    path_params(account_id, contact_id),  // NEW: Explicit declaration
    query_params(query)                    // NEW: Explicit declaration
)]
async fn list_contact_activities(
    account_id: AccountId,
    contact_id: ContactId,
    query: ActivityQuery,
) -> Result<Vec<ActivityResponse>> {
    // ...
}
Principle established: Macros should be explicit and boring, not clever and inference-heavy. Impact: Refactored 23 API routes across 3 services (auth, CRM, catalog). Verifier’s role: Caught 5 routes where Builder incorrectly categorized parameters during the refactor.

Documentation That Makes AI Smarter

After fixing 6 isolation bugs, enhancing 2 macros, and refactoring 23 routes, a pattern emerged. Pattern I noticed: Verifier kept asking the same questions:
  • “Should this entity be capsule-isolated?”
  • “What’s the standard partition key pattern?”
  • “How do we handle nested resources in API paths?”
Idea: What if we documented patterns proactively so Verifier could reference them? Next task: Update CLAUDE.md with architectural patterns.

The CLAUDE.md Breakthrough

I created a new section in the project’s CLAUDE.md (AI agent guidance document):
Section: DynamoDB Entity Patterns

Capsule Isolation Requirement

ALL entities that store user data MUST be capsule-isolated.Enforcement:
  • Use #[capsule_isolated] attribute on entity structs
  • Include capsule_id: CapsuleId field
  • Partition key MUST include CAPSULE#{capsule_id} segment
Pattern:
#[derive(DynamoDbEntity)]
#[capsule_isolated]
#[pk = "TENANT#{tenant_id}#CAPSULE#{capsule_id}#ENTITY#{id}"]
pub struct MyEntity {
    pub tenant_id: TenantId,
    pub capsule_id: CapsuleId,
    pub id: EntityId,
    // ...
}
Non-Isolated Entities (Rare): Only platform-level entities (tenants, system configs) are NOT capsule-isolated.When in doubt: Isolate it.

Section: API Route Patterns

API Route Parameter Patterns

Path Parameters

Explicitly declare which parameters come from URL path:
#[api_endpoint(
    path = "/tenants/{tenant_id}/resources/{resource_id}",
    path_params(tenant_id, resource_id),
    // ...
)]

Query Parameters

Explicitly declare query parameters:
#[api_endpoint(
    path = "/resources",
    query_params(filters, pagination),
    // ...
)]
Rule: NEVER rely on automatic inference. Be explicit.
Hypothesis: If Builder and Verifier read these patterns before starting, they’ll make fewer mistakes.

What We Built This Week

Week 3 wasn’t about building new features - it was about hardening the foundation.

Macro Enhancements

DynamoDB Macro:
  • Compile-time capsule isolation enforcement
  • Nested struct field validation
  • Clear error messages
API Macro:
  • Removed automatic parameter inference
  • Explicit path/query parameter declarations
  • OpenAPI spec correctness guarantees

Capsule Isolation Fixes

Entities updated:
  • RoleAssignment
  • SecurityGroup
  • Entitlement
  • Session (auth)
  • Federal compliance data (CRM)
Pattern: All user-scoped data now has tenant + capsule isolation

Documentation

CLAUDE.md improvements:
  • 12 architectural patterns documented
  • DynamoDB entity guidelines
  • API route conventions
  • Repository patterns
  • Event naming conventions
Impact: Agents reference these during planning

Infrastructure Principles

New ADR added:
  • ADR-0010: Capsule Isolation Enforcement
  • Infrastructure Principles §2.1: API Capsule Context Resolution
Governance: Architecture decisions now codified
Metrics:
  • 72 commits
  • 6 entities fixed for capsule isolation
  • 23 API routes refactored
  • 2 macros enhanced
  • 5 documentation files improved
  • 0 new features (all refinement)

What We Learned: Event Sourcing as a Verification Tool

Learning 1: Events Reveal Inconsistencies Humans Miss

The traditional approach:
  • Write code
  • Write tests
  • Code review
  • Merge
The problem: Tests verify “does this work?” not “is this consistent with everything else?” Event sourcing changes this: Every operation emits an event. Events have schemas. Schema inconsistencies reveal architectural violations. Example from Week 3:
// AccountCreated event schema:
{
  "tenant_id": "...",
  "capsule_id": "...",  // Present
  "account_id": "...",
  "timestamp": "..."
}

// RoleAssignmentCreated event schema:
{
  "tenant_id": "...",
  "capsule_id": null,  // Missing!
  "assignment_id": "...",
  "timestamp": "..."
}

// Inconsistency detected: RoleAssignment operations are not capsule-isolated
AI advantage: Evaluator can analyze thousands of events, spot patterns, and flag deviations in minutes. Human disadvantage: We review code files, not event logs. Cross-service consistency is hard to verify manually.

Learning 2: Macros Are Force Multipliers - For Good and Bad

Week 2 (before enforcement): DynamoDB macro made it easy to create entities. Also made it easy to forget capsule isolation. Impact:
  • 6 entities created without proper isolation
  • Each one a potential data leak
  • All passed code review (including AI verification)
Week 3 (after enforcement): Enhanced macro makes it impossible to forget.
// This code won't compile anymore
#[derive(DynamoDbEntity)]
#[capsule_isolated]
pub struct MyEntity {
    pub tenant_id: TenantId,
    // ERROR: Missing required field 'capsule_id' for capsule-isolated entity
    pub id: EntityId,
}
Principle: Macros should encode invariants, not just reduce boilerplate. Good macro:
  • Reduces repetition
  • Enforces correctness
  • Fails at compile time (not runtime)
  • Generates helpful error messages
Bad macro:
  • Just saves typing
  • Allows incorrect patterns
  • “Helpful” inference that’s sometimes wrong

Learning 3: Documentation Changes AI Behavior

Before CLAUDE.md patterns (Week 2): Builder would ask: “Should this entity be capsule-isolated?” Verifier would check: “Does the implementation match the plan?” Problem: Neither had a reference for “what’s the standard pattern?” After CLAUDE.md patterns (Week 3): Builder reads patterns first, applies them by default. Verifier checks: “Does this match the documented standard pattern?” Concrete example: Before (Week 2 - EntitlementEntity):
// Builder's first attempt (no guidance)
#[derive(DynamoDbEntity)]
#[pk = "TENANT#{tenant_id}#ENTITLEMENT#{id}"]  // Missing CAPSULE#
pub struct EntitlementEntity {
    pub tenant_id: TenantId,
    // capsule_id missing
    pub id: EntitlementId,
}
Verifier didn’t catch this because the plan didn’t specify capsule isolation explicitly. After (Week 3 - SecurityGroupEntity):
// Builder's first attempt (with CLAUDE.md guidance)
#[derive(DynamoDbEntity)]
#[capsule_isolated]  // Applied pattern automatically
#[pk = "TENANT#{tenant_id}#CAPSULE#{capsule_id}#GROUP#{id}"]
pub struct SecurityGroupEntity {
    pub tenant_id: TenantId,
    pub capsule_id: CapsuleId,
    pub id: GroupId,
}
Difference: Builder referenced CLAUDE.md patterns during planning. Got it right the first time. Insight: AI doesn’t just need instructions - it needs reference patterns to apply consistently.

Learning 4: Verification Is Not Just “Does It Work?”

Builder’s verification focus:
  • Does it compile? ✅
  • Do tests pass? ✅
  • Does it meet requirements? ✅
Verifier’s additional checks:
  • Is this consistent with other entities?
  • Does this follow documented patterns?
  • What are the edge cases?
  • What could go wrong in production?
Example from API macro verification: Builder implemented explicit path/query parameter declarations. Tests passed. All routes worked. Verifier asked: “What happens if a developer accidentally declares the same parameter as both path and query?”
// Potential bug
#[api_endpoint(
    path = "/accounts/{account_id}",
    path_params(account_id),
    query_params(account_id),  // ← Same parameter in both!
)]
Builder’s response: “The macro would generate invalid OpenAPI spec. Runtime error.” Verifier’s recommendation: “Add compile-time check to prevent duplicate parameter declarations.” Builder added the check. This bug never existed in real code - Verifier prevented it hypothetically. Lesson: Good verification asks “what COULD go wrong?” not just “what IS wrong?”

Principles We Established This Week

What we learned: Capsule isolation bugs made it to production because enforcement was runtime-only (tests).New rule: Architectural invariants MUST be enforced at compile time via macros or type system.Example:
// ❌ Runtime check (can be forgotten)
fn create_entity(entity: Entity) -> Result<()> {
    if entity.capsule_id.is_none() {
        return Err("Missing capsule_id");
    }
    // ...
}

// ✅ Compile-time enforcement (can't be forgotten)
#[derive(DynamoDbEntity)]
#[capsule_isolated]  // Compiler enforces capsule_id field exists
pub struct Entity {
    pub capsule_id: CapsuleId,  // Required by macro
    // ...
}
What we learned: Event logs revealed bugs that state-based testing missed.New rule: When debugging data consistency issues, always start with event logs, not database state.Practice:
  • Every operation emits events
  • Event schemas are validated
  • Schema inconsistencies = architectural violations
  • AI agents analyze event patterns during verification
What we learned: CLAUDE.md patterns changed how Builder and Verifier approached tasks.New rule: Document architectural patterns proactively. AI will reference them.Format:
  • Pattern name
  • When to use it
  • Code example
  • Anti-patterns (what NOT to do)
  • Rationale (why this pattern?)
What we learned: Macros that silently accept incorrect patterns are worse than no macros.New rule: Macros should validate ALL invariants and produce clear compile errors.Checklist for new macros:
  • Validates all required fields exist
  • Validates field types are correct
  • Validates attribute patterns (e.g., PK format)
  • Produces helpful error messages
  • Has tests for invalid inputs (should not compile)

The Mistake I Made (And Fixed)

After fixing the API macro: I merged it and started using it immediately for new routes. Later during CI: CI failed. 8 routes using the new macro weren’t compiling. What happened? I updated the macro to require explicit path_params/query_params attributes. But I didn’t update existing routes to use the new syntax. The failed routes:
// Old syntax (no longer valid)
#[api_endpoint(
    path = "/accounts/{account_id}",
    method = GET,
)]
async fn get_account(account_id: AccountId) -> Result<AccountResponse> {
    // ...
}

// New syntax (required)
#[api_endpoint(
    path = "/accounts/{account_id}",
    method = GET,
    path_params(account_id),  // ← Now required
)]
async fn get_account(account_id: AccountId) -> Result<AccountResponse> {
    // ...
}
Why did this happen? I tested the macro with new routes (worked fine). I verified the macro logic (correct). But I didn’t check backwards compatibility with existing routes. Classic mistake: Changed the API without migration plan. The fix (2 hours):
  1. Found all existing routes using old syntax (23 routes across 3 services)
  2. Updated each to new explicit syntax
  3. Added deprecation warnings to old syntax (instead of breaking immediately)
  4. Created migration guide in CLAUDE.md
Lesson: API changes (even internal macros) need backwards compatibility checks. Should’ve been part of verification.

Metrics: Week 3 by the Numbers

Planned work: Fix capsule isolation gapsActual work:
  • 6 entities fixed for isolation ✅
  • DynamoDB macro enhanced ✅
  • API macro refactored ✅
  • 23 routes updated ✅
  • CLAUDE.md patterns added ✅
  • Infrastructure ADR written ✅
Commits: 72 (3x more than Week 2, but smaller/focused)Time estimate (manual): 2 weeksActual time (AI workflow): 4.5 daysSpeedup: 3.1x

The Event That Changed Everything

Remember the event log warning I showed at the start?
[WARN] Capsule isolation violation detected
Entity: RoleAssignment
That event wasn’t from production. It was from our test environment’s event analysis system. Here’s what I built (quietly, over the weekend before Week 3): Event Schema Validator (Lambda function):
// Runs on every event published to EventBridge
async fn validate_event_schema(event: EventBridgeEvent) -> Result<()> {
    // 1. Extract event type
    let event_type = event.detail_type;

    // 2. Load expected schema for this event type
    let expected_schema = load_schema(&event_type)?;

    // 3. Compare actual event fields to expected schema
    let actual_fields = extract_fields(&event.detail);

    // 4. Check for required fields
    for required_field in expected_schema.required_fields {
        if !actual_fields.contains(&required_field) {
            log_violation(MissingRequiredField {
                event_type,
                missing_field: required_field,
                severity: Critical,
            });
        }
    }

    // 5. Check for consistency across related events
    if let Some(related_events) = get_related_events(&event_type) {
        for related in related_events {
            check_field_consistency(&event, &related)?;
        }
    }

    Ok(())
}
This Lambda caught 6 isolation violations in test environment BEFORE any code reached production. Why this matters: Traditional testing asks: “Does this code work?” Event validation asks: “Is this code consistent with our architectural patterns?” Different questions. Different bugs caught.

What’s Next: Week 4 Preview

Week 3 was about hardening patterns. Week 4 will be about scaling patterns. The challenge: Our CRM domain now has:
  • 15 entity types
  • 37 API routes
  • 8 background workers
  • 3 event processing pipelines
The questions:
  • Can our macro patterns scale to 100+ entities?
  • Can event validation handle 10,000 events/second?
  • How do we verify cross-service consistency at scale?
Week 4 focus:
  • Multi-service integration testing
  • Event replay for debugging
  • AI-assisted performance optimization
  • Distributed tracing patterns
The big experiment: Can AI help us find performance bottlenecks BEFORE load testing? We’ll use the event logs to simulate production traffic patterns and see what breaks.

Week 4: AI-Driven Performance Testing

Next week: Using event patterns to predict production bottlenecks

Code Examples (Sanitized)

Here’s the final capsule-isolated entity pattern we established:
// Generic pattern for all user-scoped entities
#[derive(DynamoDbEntity, Debug, Clone, Serialize, Deserialize)]
#[capsule_isolated]
#[table_name = "platform_data"]
#[pk = "TENANT#{tenant_id}#CAPSULE#{capsule_id}#ENTITY#{entity_type}#{id}"]
#[sk = "METADATA#v{version}"]
pub struct GenericEntity {
    // Required isolation fields (enforced by macro)
    pub tenant_id: TenantId,
    pub capsule_id: CapsuleId,

    // Entity identification
    pub id: EntityId,
    pub entity_type: EntityType,

    // Event sourcing metadata
    pub version: u64,
    pub last_event_id: Option<EventId>,

    // Audit fields
    pub created_at: DateTime<Utc>,
    pub created_by: UserId,
    pub updated_at: DateTime<Utc>,
    pub updated_by: UserId,

    // Entity-specific data
    pub data: serde_json::Value,
}

// Macro enforces:
// 1. tenant_id and capsule_id fields must exist
// 2. PK must include both TENANT# and CAPSULE# segments
// 3. Compile error if either is missing
// 4. Generated code validates isolation at query time
What this pattern gave us:
  • Zero capsule isolation bugs after Week 3
  • Consistent PK/SK patterns across all entities
  • Compile-time guarantees for security boundaries
  • Clear audit trail via event sourcing metadata

Discussion: How Do You Validate Architectural Patterns?

Pattern Validation with AI

Do you use AI to validate architectural consistency? How do you catch deviations?I’d love to hear:
  • Do you use event sourcing or similar patterns?
  • How do you validate cross-service consistency?
  • Any compile-time enforcement techniques?
Share your experience:

Disclaimer: This content documents my personal AI workflow experiments. All examples are from personal projects and have been sanitized to remove proprietary information.Code snippets represent generic patterns for educational purposes. This does not represent my employer’s technologies or approaches.