Performance Optimization Patterns at Scale

The 2 AM Wake-Up Call

The alert came through at 2:17 AM: “API response times exceeding 5 seconds.” I opened the dashboard. What I saw made my stomach drop:

Session queries: 500ms average (should be less than 50ms)
Contact lookups: Full table scans on a 100k record table
Webhook execution: 200ms per call (half of that just establishing connections)
Memory usage: Spiking to 50MB for simple queries

We had the classic multi-tenant SaaS performance problem: code that works fine in development falls apart at scale. The platform was built on DynamoDB with event sourcing. Good architectural choices, but we’d made critical mistakes in how we queried data, managed connections, and structured our code. Over the next month, we systematically addressed seven major bottlenecks. The results were dramatic: 10x to 1000x improvements across the board. Here’s what we learned about performance optimization at scale.

Pattern #1: Query Scoping - Filter at the Database, Not in Memory

The Problem

Our session query looked innocent enough:

// Load ALL sessions for the tenant, then filter in memory
pub async fn get_sessions_for_capsule(
    &self,
    tenant_id: &str,
    capsule_id: &str,
) -> Result<Vec<Session>> {
    // Query DynamoDB for ALL tenant sessions
    let all_sessions = self.repository
        .query_by_tenant(tenant_id)
        .await?;

    // Filter in memory to find capsule sessions
    let filtered: Vec<Session> = all_sessions
        .into_iter()
        .filter(|s| s.capsule_id == capsule_id)
        .collect();

    Ok(filtered)
}

This worked fine in development with 10 sessions per tenant. In production with 1,000 sessions per tenant, it was a disaster:

Latency: 500ms (loading 1,000 records, filtering to 50)
Memory: 50MB allocated for the full dataset
Cost: Reading 1,000 DynamoDB items when we needed 50

The Solution

Push the filtering down to the database level:

// Filter at the database level using DynamoDB expressions
pub async fn get_sessions_for_capsule(
    &self,
    tenant_id: &str,
    capsule_id: &str,
) -> Result<Vec<Session>> {
    // Query with filter expression - DynamoDB does the filtering
    let sessions = self.repository
        .query_by_tenant(tenant_id)
        .filter_expression("capsule_id = :capsule_id")
        .expression_values(hashmap! {
            ":capsule_id" => AttributeValue::S(capsule_id.to_string())
        })
        .await?;

    Ok(sessions)
}

The Results

Latency: 500ms → 50ms (10x improvement)
Memory: 50MB → 5MB (90% reduction)
DynamoDB reads: 1,000 items → 50 items (20x fewer reads)
Monthly cost savings: $240 in reduced read capacity

When to Apply

Use database-level filtering when:

High cardinality filters - Selecting small subset from large dataset
Repeated queries - Same filter pattern used frequently
Large result sets - Base query returns more than 100 records

Warning: DynamoDB filter expressions reduce data transfer but still consume read capacity for scanned items. For true O(1) lookups, you need a GSI (see Pattern #2).

Pattern #2: Strategic Indexing - Global Secondary Indices for Fast Lookups

The Problem

Contact queries were our second major bottleneck:

// Find contact by account_id - requires full table scan
pub async fn get_contact_by_account(
    &self,
    account_id: &str,
) -> Result<Option<Contact>> {
    let all_contacts = self.repository
        .scan()  // ⚠️ FULL TABLE SCAN
        .await?;

    all_contacts
        .into_iter()
        .find(|c| c.account_id == account_id)
        .ok_or(Error::NotFound)
}

With 100,000 contacts in the table, every lookup was scanning the entire table. Performance:

Best case: 800ms (table scan of 100k items)
Worst case: 2,000ms (when contact is at the end)
Complexity: O(n) - gets slower as data grows

The Solution

Add Global Secondary Indices (GSI) for common query patterns:

#[derive(DynamoDbEntity)]
#[dynamodb(table = "contacts")]
pub struct ContactEntity {
    #[dynamodb(partition_key)]
    pub id: String,

    #[dynamodb(sort_key)]
    pub tenant_id: String,

    // GSI5: PrimaryContactIndex for account lookups
    #[dynamodb(gsi5_partition_key)]
    pub account_id: String,

    // GSI6: ExecutiveIndex for role-based queries
    #[dynamodb(gsi6_partition_key)]
    pub role: String,

    #[dynamodb(gsi6_sort_key)]
    pub department: String,
}

// Generated query method (from derive macro)
pub async fn query_by_account(
    &self,
    account_id: &str,
) -> Result<Vec<Contact>> {
    self.repository
        .query_gsi5(account_id)  // Direct GSI lookup
        .await
}

The Results

Latency: 800ms → 15ms (53x improvement)
Complexity: O(n) → O(1) (constant time lookups)
Cost: Full table scan → Single partition read

Index Design Strategy

We created GSI indices for our top 3 query patterns:

Query Pattern	Index	Use Case
Account lookups	GSI5 (account_id)	Find contact by account
Role queries	GSI6 (role + department)	Executive dashboards
Status filters	GSI7 (status + created_at)	Active contacts list

When to Apply

Create a GSI when:

Non-key queries - Querying on attributes other than primary key
High query frequency - Pattern used more than 100 times per day
Large tables - Table has more than 10,000 items

Cost consideration: Each GSI consumes additional storage and write capacity. We limit to 3-4 GSIs per table to balance performance and cost.

Pattern #3: Smart Caching - In-Memory Cache with TTL

The Problem

Our webhook system needed to load hook configuration for every webhook execution:

// Every webhook call hits DynamoDB
pub async fn execute_webhook(
    &self,
    hook_id: &str,
    payload: &str,
) -> Result<()> {
    // Load hook config from DynamoDB - 100ms
    let hook = self.repository
        .get_hook(hook_id)
        .await?;

    // Execute webhook - 150ms
    self.http_client
        .post(&hook.url)
        .body(payload)
        .send()
        .await?;

    Ok(())
}

Problem: Hook configuration rarely changes (maybe once a week), but we were hitting DynamoDB on every execution. For a high-volume tenant:

10,000 webhook calls per day
10,000 DynamoDB reads per day
100ms × 10,000 = 16 minutes of cumulative latency

The Solution

In-memory cache with 5-minute TTL:

use std::sync::Arc;
use tokio::sync::RwLock;
use std::time::{Duration, Instant};

pub struct CachedHookRepository {
    repository: Arc<DynamoDbRepository>,
    cache: Arc<RwLock<HashMap<String, CachedHook>>>,
    ttl: Duration,
}

struct CachedHook {
    hook: Hook,
    expires_at: Instant,
}

impl CachedHookRepository {
    pub fn new(repository: DynamoDbRepository) -> Self {
        Self {
            repository: Arc::new(repository),
            cache: Arc::new(RwLock::new(HashMap::new())),
            ttl: Duration::from_secs(300), // 5 minutes
        }
    }

    pub async fn get_hook(&self, hook_id: &str) -> Result<Hook> {
        // Check cache first
        {
            let cache = self.cache.read().await;
            if let Some(cached) = cache.get(hook_id) {
                if cached.expires_at > Instant::now() {
                    return Ok(cached.hook.clone()); // Cache hit: 0.1ms
                }
            }
        }

        // Cache miss or expired - fetch from DynamoDB
        let hook = self.repository.get_hook(hook_id).await?;

        // Update cache
        {
            let mut cache = self.cache.write().await;
            cache.insert(hook_id.to_string(), CachedHook {
                hook: hook.clone(),
                expires_at: Instant::now() + self.ttl,
            });
        }

        Ok(hook)
    }
}

The Results

Cache hit latency: 100ms → 0.1ms (1000x improvement)
DynamoDB reads: 10,000/day → 288/day (97% reduction)
Monthly savings: $180 in DynamoDB read costs

Cache hit rate: 96% (only refresh every 5 minutes)

Cache Design Decisions

Why 5-minute TTL?

Hook configs change infrequently (weekly at most)
5 minutes is acceptable staleness for this use case
Shorter TTL (1 min) → 83% cache hit rate
Longer TTL (15 min) → 98% cache hit rate but unacceptable staleness

Why in-memory instead of Redis?

Single-instance application (not distributed)
Cache size less than 10MB (not memory-constrained)
No need for cross-instance consistency
Simpler architecture, zero infrastructure cost

When to Apply

Use in-memory caching when:

Read-heavy workload - more than 10:1 read-to-write ratio
Acceptable staleness - Data doesn’t need real-time consistency
Small dataset - Cached data less than 100MB
Single instance - Or use Redis for distributed cache

When NOT to cache:

User session data (security risk)
Financial transactions (consistency critical)
Real-time analytics (staleness unacceptable)

Pattern #4: Connection Pooling - Reuse HTTP Connections

The Problem

Our webhook execution was establishing a new HTTP connection for every call:

// Creating new HTTP client per request
pub async fn execute_webhook(
    &self,
    url: &str,
    payload: &str,
) -> Result<()> {
    // New connection + TLS handshake: ~50ms
    let client = reqwest::Client::new();

    let response = client
        .post(url)
        .body(payload)
        .send()  // Actual request: ~100ms
        .await?;

    Ok(())
}

Performance breakdown per call:

TLS handshake: 30-50ms
DNS lookup: 10-20ms
Actual HTTP request: 100ms
Total: 150ms (1/3 of time spent on connection setup)

With 10,000 webhook calls per day, that’s 8 hours of wasted time on connection setup.

The Solution

HTTP connection pool with keepalive:

use reqwest::Client;
use std::time::Duration;

pub struct WebhookExecutor {
    // Shared HTTP client with connection pool
    client: Client,
}

impl WebhookExecutor {
    pub fn new() -> Self {
        let client = Client::builder()
            .pool_max_idle_per_host(32)  // Keep 32 idle connections per host
            .pool_idle_timeout(Duration::from_secs(90))
            .timeout(Duration::from_secs(30))
            .build()
            .expect("Failed to build HTTP client");

        Self { client }
    }

    pub async fn execute_webhook(
        &self,
        url: &str,
        payload: &str,
    ) -> Result<()> {
        // Reuses connection from pool - no TLS handshake
        let response = self.client
            .post(url)
            .body(payload)
            .send()  // Only ~100ms (50ms saved)
            .await?;

        Ok(())
    }
}

The Results

Latency: 150ms → 100ms (33% improvement)
Connection overhead: 50ms → less than 1ms (on pool hit)
Daily time saved: 8 hours → less than 10 minutes

Pool Configuration

Why 32 idle connections per host?

Typical webhook workload: 5-10 concurrent calls
32 gives headroom for bursts
Idle timeout (90s) prevents stale connections

Pool hit rate: 92% (only 8% of calls need new connection)

When to Apply

Use connection pooling for:

HTTP clients - Any service making frequent HTTP requests
Database connections - Connection pools are critical
External API calls - Especially with TLS overhead

Most HTTP libraries (like reqwest in Rust) have built-in pooling. Just enable it and configure appropriately.

Pattern #5: Code Generation - Eliminate Boilerplate with Macros

The Problem

This isn’t a runtime performance issue, but a developer velocity bottleneck that led to performance bugs. Every DynamoDB entity required 100-200 lines of boilerplate:

// Manual implementation - 150 lines per entity
pub struct ContactEntity {
    pub id: String,
    pub tenant_id: String,
    pub account_id: String,
    pub role: String,
}

impl ContactEntity {
    // Manual attribute conversion - error-prone
    pub fn from_item(item: HashMap<String, AttributeValue>) -> Result<Self> {
        Ok(Self {
            id: item.get("id")
                .and_then(|v| v.as_s().ok())
                .ok_or(Error::MissingField("id"))?
                .clone(),
            tenant_id: item.get("tenant_id")
                .and_then(|v| v.as_s().ok())
                .ok_or(Error::MissingField("tenant_id"))?
                .clone(),
            // ... 20 more fields
        })
    }

    pub fn to_item(&self) -> HashMap<String, AttributeValue> {
        let mut item = HashMap::new();
        item.insert("id".to_string(), AttributeValue::S(self.id.clone()));
        item.insert("tenant_id".to_string(), AttributeValue::S(self.tenant_id.clone()));
        // ... 20 more fields
        item
    }

    // Manual GSI query methods
    pub async fn query_gsi5(&self, account_id: &str) -> Result<Vec<Self>> {
        // 30 lines of query logic
    }
}

Problems:

150 lines × 7 entities = 1,050 lines of boilerplate
Copy-paste errors (wrong field mappings)
Missing GSI query methods (led to full table scans)
No type safety (typos in field names caught at runtime)

The Solution

Derive macro for automatic code generation:

// Macro-driven implementation - 15 lines total
#[derive(DynamoDbEntity)]
#[dynamodb(table = "contacts")]
pub struct ContactEntity {
    #[dynamodb(partition_key)]
    pub id: String,

    #[dynamodb(sort_key)]
    pub tenant_id: String,

    #[dynamodb(gsi5_partition_key)]
    pub account_id: String,

    #[dynamodb(gsi6_partition_key)]
    pub role: String,
}

// Macro generates:
// - from_item() / to_item() methods
// - query_gsi5() / query_gsi6() methods
// - Type-safe field accessors
// - Compile-time validation

The Results

Code reduction:

Before: 150 lines per entity × 7 entities = 1,050 lines
After: 15 lines per entity × 7 entities = 105 lines
Reduction: 95% (945 lines eliminated)

Developer velocity:

Adding new entity: 30 minutes → 3 minutes (10x faster)
Zero copy-paste errors (compile-time validation)
All GSI queries auto-generated (no more accidental table scans)

Runtime performance impact:

Generated code is identical to hand-written (zero overhead)
Bonus: Caught 3 inefficient queries at compile time (missing GSI annotations)

When to Apply

Use code generation (macros, code gen tools) when:

Repetitive patterns - Same structure across multiple entities
Error-prone boilerplate - Manual code leads to bugs
Compile-time validation - Type safety prevents runtime errors

Languages with macro support:

Rust: derive macros
Java: Annotation processors
Python: Decorators + code generation
TypeScript: Decorators + transformers

Lessons Learned

What Worked: The 80/20 Rule

Small changes, massive impact:

Session query scoping (20 lines changed) → 10x improvement
Hook caching (50 lines added) → 1000x improvement on cache hits
Connection pooling (5 lines changed) → 33% latency reduction

Total code changes: less than 500 lines Total performance improvement: 10x-1000x across the board The lesson: Look for the highest-leverage changes first. Don’t optimize everything—optimize the bottlenecks.

What Surprised Us

Caching isn’t always the answer: We tried caching session queries (Pattern #1). Result: Minimal improvement. Why? Sessions change frequently (every user action). Cache hit rate was only 12%, so we were paying cache overhead for little benefit. Better solution: Fix the root cause (query scoping) instead of papering over it with caching. The right tool for the job:

Frequently changing data → Query optimization
Rarely changing data → Caching
Repeated patterns → Code generation

What We’d Do Differently

1. Measure first, optimize second We wasted time optimizing the wrong queries. Our initial guess was “contact queries are the problem” (they were visible in logs). The real culprit was session queries (10x more frequent but hidden in background jobs). Lesson: Use profiling and monitoring to identify real bottlenecks, not gut feel. 2. Document the “why” in ADRs Six months later, a new developer asked: “Why do we have 5-minute cache TTL for hooks?” No one remembered. Was it arbitrary? Load testing? Customer requirement? Lesson: Write Architecture Decision Records (ADRs) explaining performance choices and trade-offs.

Implementation Guide

Step 1: Identify Bottlenecks

Don’t guess. Measure. Tools we used:

Application metrics - Track latency by operation
Database slow query logs - Identify expensive queries
Profiling - Find CPU/memory hotspots
Distributed tracing - Track request flow across services

Key metrics to track:

P50, P95, P99 latency (not just average)
Database query counts and latency
Memory allocation per request
Cache hit rates

Step 2: Prioritize by Impact

Not all slow queries matter equally. Impact = Frequency × Slowness × Business Value

Query	Frequency	Latency	Business Impact	Priority
Session by capsule	10,000/day	500ms	High (API)	1
Contact by account	5,000/day	800ms	High (Dashboard)	2
Hook config load	10,000/day	100ms	Medium (Background)	3
Admin reports	10/day	2000ms	Low (Internal)	4

Fix high-frequency, high-latency queries first.

Step 3: Optimize and Measure

Before optimizing:

Write benchmark test
Record baseline metrics
Set target improvement (e.g., “reduce P95 to less than 100ms”)

After optimizing:

Run benchmark again
Verify improvement
Deploy to staging
Monitor production for regressions

Example benchmark:

#[bench]
fn bench_session_query(b: &mut Bencher) {
    let repo = setup_test_repo();

    b.iter(|| {
        repo.get_sessions_for_capsule("tenant_1", "capsule_42")
    });
}

// Before: 485ms ± 25ms
// After:  48ms ± 3ms (10x improvement ✅)

Step 4: Document with ADRs

Create an Architecture Decision Record for significant optimizations:

# ADR-015: Session Query Scoping with DynamoDB Filter Expressions

## Context
Session queries were taking 500ms and consuming 50MB memory per request
by loading all tenant sessions and filtering in memory.

## Decision
Use DynamoDB filter expressions to push filtering to the database level.

## Consequences
**Positive:**
- 10x latency improvement (500ms → 50ms)
- 90% memory reduction (50MB → 5MB)
- 20x fewer DynamoDB reads

**Negative:**
- Filter expressions still consume read capacity for scanned items
- For true O(1) lookups, need GSI (see ADR-016)

**Trade-offs:**
- Chose filter expressions over GSI due to lower complexity
- GSI would add storage cost and write amplification
- Current solution adequate for less than 1000 sessions per tenant

## Metrics
- Benchmark: 485ms → 48ms
- Production P95: 520ms → 52ms
- Monthly cost savings: $240

## References
- Related: ADR-016 (Contact GSI indices)
- AWS docs: DynamoDB Filter Expressions

Results Summary

Here’s the complete impact across all optimizations:

Optimization	Metric	Before	After	Improvement
Query Scoping	Latency	500ms	50ms	10x
	Memory	50MB	5MB	10x
	DynamoDB reads	1,000 items	50 items	20x
Contact GSI	Latency	800ms	15ms	53x
	Complexity	O(n)	O(1)	-
Hook Caching	Cache hit latency	100ms	0.1ms	1000x
	Daily DB reads	10,000	288	97% reduction
Connection Pooling	Latency	150ms	100ms	1.5x
	Connection overhead	50ms	less than 1ms	50x
Code Generation	Lines of code	1,050	105	95% reduction
	Dev time per entity	30 min	3 min	10x

Cost Savings:

DynamoDB reads: $420/month saved
Developer time: 27 hours/month saved (new entities, maintenance)

Development Velocity:

Faster iteration (less boilerplate)
Fewer bugs (compile-time validation)
Better performance by default (generated GSI queries)

Actionable Takeaways

If you’re facing similar performance challenges:

Measure before optimizing - Use profiling and monitoring to find real bottlenecks, not guesses
Start with database optimization - Query scoping and indices often give 10x-100x improvements for minimal code changes
Cache judiciously - Only cache data with high read-to-write ratios and acceptable staleness
Reuse connections - Enable connection pooling for HTTP clients and database connections (often just configuration)
Automate repetitive code - Macros and code generation reduce errors and make performance optimizations consistent
Document performance decisions - Write ADRs explaining why you chose specific optimizations and their trade-offs
Track the right metrics:
- P95/P99 latency (not just averages)
- Cache hit rates
- Database query patterns
- Cost per request

Pro tip: The fastest query is the one you don’t make. Before adding caching or indices, ask: “Can we eliminate this query entirely?”In our case, we eliminated 60% of hook config queries by passing configuration in the event payload instead of looking it up.

Resources & Further Reading

DynamoDB Optimization:

Caching Strategies:

Related Articles:

Multi-Agent AI Workflow - How we built these optimizations systematically
ADRs as Architecture Documentation - Documenting performance decisions

Discussion

Share Your Experience

What performance optimizations have you implemented? What patterns worked (or didn’t)?Connect on LinkedIn or comment on the YouTube Short

Disclaimer: This content represents my personal learning journey using AI for a personal project. It does not represent my employer’s views, technologies, or approaches.All code examples are generic patterns or pseudocode for educational purposes. Performance numbers are from real implementations but have been sanitized and rounded for clarity.

Workflows

Process

​The 2 AM Wake-Up Call

​Pattern #1: Query Scoping - Filter at the Database, Not in Memory

​The Problem

​The Solution

​The Results

​When to Apply

​Pattern #2: Strategic Indexing - Global Secondary Indices for Fast Lookups

​The Problem

​The Solution

​The Results

​Index Design Strategy

​When to Apply

​Pattern #3: Smart Caching - In-Memory Cache with TTL

​The Problem

​The Solution

​The Results

​Cache Design Decisions

​When to Apply

​Pattern #4: Connection Pooling - Reuse HTTP Connections

​The Problem

​The Solution

​The Results

​Pool Configuration

​When to Apply

​Pattern #5: Code Generation - Eliminate Boilerplate with Macros

​The Problem

​The Solution

​The Results

​When to Apply

​Lessons Learned

​What Worked: The 80/20 Rule

​What Surprised Us

​What We’d Do Differently

​Implementation Guide

​Step 1: Identify Bottlenecks

​Step 2: Prioritize by Impact

​Step 3: Optimize and Measure

​Step 4: Document with ADRs

​Results Summary

​Actionable Takeaways

​Resources & Further Reading

​Discussion

Share Your Experience

The 2 AM Wake-Up Call

Pattern #1: Query Scoping - Filter at the Database, Not in Memory

The Problem

The Solution

The Results

When to Apply

Pattern #2: Strategic Indexing - Global Secondary Indices for Fast Lookups

The Problem

The Solution

The Results

Index Design Strategy

When to Apply

Pattern #3: Smart Caching - In-Memory Cache with TTL

The Problem

The Solution

The Results

Cache Design Decisions

When to Apply

Pattern #4: Connection Pooling - Reuse HTTP Connections

The Problem

The Solution

The Results

Pool Configuration

When to Apply

Pattern #5: Code Generation - Eliminate Boilerplate with Macros

The Problem

The Solution

The Results

When to Apply

Lessons Learned

What Worked: The 80/20 Rule

What Surprised Us

What We’d Do Differently

Implementation Guide

Step 1: Identify Bottlenecks

Step 2: Prioritize by Impact

Step 3: Optimize and Measure

Step 4: Document with ADRs

Results Summary

Actionable Takeaways

Resources & Further Reading

Discussion