Skip to main content

Event Flow E2E Testing (The Work Nobody Wants to Do)

The Context: After building a CRM domain layer with event sourcing, we had unit tests, integration tests, even some event flow tests. But we didn’t have comprehensive end-to-end verification that events actually flow correctly through the entire system. Why this matters: In event-sourced systems, bugs in event flow are catastrophic:
  • Miss an event → audit trail broken
  • Wrong event order → state corruption
  • Cross-tenant event leak → compliance violation
The traditional problem: Writing E2E tests for event flows is tedious:
  1. Set up test infrastructure (EventBridge, SQS, DynamoDB Streams)
  2. Create test data for each entity type
  3. Trigger actions and wait for async event delivery
  4. Verify event payload, ordering, and side effects
  5. Clean up test resources
  6. Repeat for every entity and workflow combination
Estimated manual effort: 2-3 weeks for comprehensive coverage What I tried: Give the entire task to AI.

The Planning Session

Planning session for comprehensive E2E event flow testing.

Context:
- Event-sourced CRM with 7 entities (Account, Contact, Lead,
  Opportunity, Activity, Product, Address)
- Events flow: DynamoDB Streams → EventBridge → SQS → Consumers
- Need to verify event delivery, ordering, and payload correctness
- Must test cross-entity workflows (e.g., Lead conversion)

Requirements:
1. Test each entity's event flow independently
2. Test multi-entity workflows (create Account → add Contact → convert Lead)
3. Test failure scenarios (event delivery failure, consumer errors)
4. All tests must run against LocalStack (no AWS resources)

Design comprehensive test suite with:
- Test data fixtures
- Event verification utilities
- Async event waiting helpers
- Clear assertion patterns
The result: Comprehensive E2E event testing that would have taken 2-3 weeks manually, done in 1.5 days with AI. But more importantly: I actually have confidence in the event system now. Without AI, I would’ve written 3-4 “smoke tests” and hoped for the best.
Key Insight: AI makes thorough testing economically viable. The marginal cost of going from “some tests” to “comprehensive coverage” dropped from weeks to hours.

The Event Collector Implementation

Here’s the event collector utility we built for E2E testing:
/// Async event collector with timeout and filtering
pub struct EventCollector {
    queue_url: String,
    sqs_client: aws_sdk_sqs::Client,
    timeout: Duration,
}

impl EventCollector {
    /// Collect events matching predicate within timeout
    pub async fn collect_events<F>(
        &self,
        predicate: F,
        expected_count: usize,
    ) -> Result<Vec<Event>>
    where
        F: Fn(&Event) -> bool,
    {
        let start = Instant::now();
        let mut collected = Vec::new();

        // Exponential backoff with jitter
        let mut delay = Duration::from_millis(100);

        while start.elapsed() < self.timeout {
            // Poll SQS queue
            let messages = self.sqs_client
                .receive_message()
                .queue_url(&self.queue_url)
                .max_number_of_messages(10)
                .wait_time_seconds(1)
                .send()
                .await?
                .messages
                .unwrap_or_default();

            for msg in messages {
                let event: Event = serde_json::from_str(&msg.body)?;

                if predicate(&event) {
                    collected.push(event);

                    if collected.len() >= expected_count {
                        return Ok(collected);
                    }
                }
            }

            // Exponential backoff with jitter
            tokio::time::sleep(delay).await;
            delay = (delay * 2).min(Duration::from_secs(5));
        }

        Err(Error::EventCollectionTimeout {
            expected: expected_count,
            received: collected.len(),
            elapsed: start.elapsed(),
        })
    }
}
Usage in tests:
#[tokio::test]
async fn test_lead_conversion_emits_events() {
    let collector = EventCollector::new("test-queue-url", Duration::from_secs(10));

    // Trigger lead conversion
    convert_lead_to_opportunity(lead_id).await?;

    // Collect events
    let events = collector
        .collect_events(
            |e| e.entity_type == "Lead" || e.entity_type == "Opportunity",
            2, // Expect: LeadConverted + OpportunityCreated
        )
        .await?;

    // Verify event ordering and payload
    assert_eq!(events[0].event_type, "LeadConverted");
    assert_eq!(events[1].event_type, "OpportunityCreated");
    assert_eq!(events[1].payload["lead_id"], lead_id);
}
What made this work:
  • Exponential backoff handles EventBridge → SQS delays
  • Predicate filtering allows flexible event matching
  • Helpful error messages on timeout (shows expected vs received)
  • Reusable across all 21 test scenarios

Testing Principles Established

What we learned: Work that requires consistent application of testing patterns across many cases is perfectly suited for AI.Example: 21 E2E test scenarios following same pattern in 1.5 daysRule: If testing requires “do the same validation many times consistently,” delegate entirely to AI.Anti-pattern: Using AI for exploratory testing (AI needs clear success criteria).
What we learned: Comprehensive testing that was “too expensive” manually becomes “obviously worth it” with AI.Example: E2E event flow testing
  • Manual estimate: 2-3 weeks (not worth it)
  • With AI: 1.5 days (absolutely worth it)
Result: Testing quality improved not because AI writes better tests, but because comprehensive testing became economically viable.Rule: Re-evaluate “not worth the effort” decisions when AI changes the effort equation.
What we learned: Builder wrote tests that covered code paths but didn’t map to requirements.Practice: Every test must reference a requirement in the test name.Example:
#[test]
fn test_account_name_validation_per_prd_section_2_1() {
    // Test specific requirement from PRD
}

Metrics

Test coverage created:
  • Event flow tests: 21 scenarios
  • Integration tests: 47 scenarios
  • Unit tests: 156 tests
  • Overall coverage: 89%
Critical issues prevented by comprehensive testing:
  1. Race condition in event collection (async timing)
  2. Cross-tenant event leak in negative tests
  3. Incomplete cleanup between tests
Value: Prevented at least 2 weeks of production debugging and customer issues.