Skip to main content
This article is part of the “Building with AI” series documenting my journey using multi-agent AI workflows to build production systems.All examples are from personal projects and do not represent employer technologies.

The Promise

Everyone says single-table DynamoDB is elegant. One table to rule them all. But after 12 months building a production SaaS platform, here’s what actually happened… I started with the vision: one DynamoDB table for the entire application. Clean. Simple. The way Amazon does it. Rick Houlihan’s re:Invent talks made it look effortless. Twelve months later, the platform has 19 domain-specific tables, 56+ entity types, and 23,820 lines of DynamoDB code. The thesis failed. But not in the way you’d expect. The real lesson: Single-table design works brilliantly WITHIN bounded contexts, but breaks down when applied to entire applications. AI Focus: Could AI analyze production DynamoDB usage patterns and identify architectural violations? System Example: Automated audit revealing 217 SDK bypasses, scan operations costing 200500/month,and200-500/month, and 20K in technical debt.

The Journey: One Table to 19

Start with a single table. Use composite keys. GSIs for access patterns. One table to rule them all.
PK: TENANT#abc123
SK: LEAD#123

GSI1PK: OWNER#user456
GSI1SK: CREATED#2025-01-15
Clean. Elegant. Rick Houlihan approved.

When It Worked (The Successes)

The single-table approach delivered measurable wins in specific areas. Here’s where it excelled:

1. Tenant Isolation: Zero Cross-Tenant Leaks

Achievement: 12 months, zero cross-tenant data leaks. Every query starts with partition key TENANT#<tenant_id>. DynamoDB’s query semantics guarantee isolation—there’s no way to accidentally fetch data from another tenant.
// platform-core/src/repository/tenant_repository.rs

// Every query is scoped to tenant
pub async fn find_lead(&self, tenant_id: &str, lead_id: &str) -> Result<Lead> {
    self.client
        .query()
        .table_name(&self.table_name)
        .key_condition_expression("PK = :pk AND SK = :sk")
        .expression_attribute_values(":pk", AttributeValue::S(format!("TENANT#{}", tenant_id)))
        .expression_attribute_values(":sk", AttributeValue::S(format!("LEAD#{}", lead_id)))
        .send()
        .await?
}

// Impossible to accidentally query across tenants
// No SQL injection, no WHERE clause bugs
Why it works: Partition keys enforce boundaries at the database level, not application logic.

2. Hierarchical Queries: Less Than 100ms Performance

Achievement: Tenant → Capsule → Account queries under 100ms. Single-table design shines for hierarchical relationships. The sort key enables range queries without joins.
// Fetch all accounts for a tenant's capsule
PK: TENANT#abc123
SK begins_with: CAPSULE#xyz#ACCOUNT#

// One query, no joins, <100ms
pub async fn list_accounts_for_capsule(
    &self, 
    tenant_id: &str, 
    capsule_id: &str
) -> Result<Vec<Account>> {
    self.client
        .query()
        .table_name(&self.table_name)
        .key_condition_expression("PK = :pk AND begins_with(SK, :sk_prefix)")
        .expression_attribute_values(
            ":pk", 
            AttributeValue::S(format!("TENANT#{}", tenant_id))
        )
        .expression_attribute_values(
            ":sk_prefix", 
            AttributeValue::S(format!("CAPSULE#{}#ACCOUNT#", capsule_id))
        )
        .send()
        .await?
}
Performance: Tested with 10,000 accounts per tenant. 99th percentile: 87ms. Why it works: Sort key ordering enables efficient range scans without secondary indexes.

3. Macro-Driven Repositories: 70% Boilerplate Reduction

Achievement: #[derive(DynamoRepository)] eliminates 300 lines per entity. Single-table patterns are repetitive—perfect for code generation. The platform uses proc macros to generate repository implementations.
// crm/src/domain/lead.rs

#[derive(DynamoRepository)]
#[dynamo(
    table = "crm-domain",
    pk = "TENANT#{tenant_id}",
    sk = "LEAD#{id}",
    gsi1 = "OWNER#{owner_id}",
    gsi2 = "CREATED#{created_at}"
)]
pub struct Lead {
    pub id: LeadId,
    pub tenant_id: String,
    pub owner_id: String,
    pub name: String,
    pub status: LeadStatus,
    pub created_at: DateTime<Utc>,
}

// Macro generates:
// - save(&self) -> Result<()>
// - find_by_id(tenant_id, lead_id) -> Result<Lead>
// - list_by_owner(tenant_id, owner_id) -> Result<Vec<Lead>>
// - list_by_created_date(tenant_id, start, end) -> Result<Vec<Lead>>
// - delete(tenant_id, lead_id) -> Result<()>
Impact: 56 entities × 300 lines = 16,800 lines eliminated. Why it works: Single-table patterns are uniform enough to automate, but flexible enough to customize via attributes. See The Macro That Wrote 80% of Our Repositories for implementation details.

4. Geographic Data Residency: GDPR Compliance Built-In

Achievement: EU tenants’ data stays in eu-west-1, US tenants in us-east-1. Each tenant has a data_region attribute. The repository routes queries to the correct regional table.
// platform-core/src/repository/multi_region_repository.rs

pub struct MultiRegionRepository {
    us_client: DynamoDbClient,
    eu_client: DynamoDbClient,
    tenant_cache: Arc<Moka<String, DataRegion>>,
}

impl MultiRegionRepository {
    pub async fn save_lead(&self, lead: &Lead) -> Result<()> {
        let region = self.get_tenant_region(&lead.tenant_id).await?;
        
        let client = match region {
            DataRegion::US => &self.us_client,
            DataRegion::EU => &self.eu_client,
        };
        
        client.put_item()
            .table_name(format!("crm-domain-{}", region.table_suffix()))
            .item("PK", AttributeValue::S(format!("TENANT#{}", lead.tenant_id)))
            .item("SK", AttributeValue::S(format!("LEAD#{}", lead.id)))
            // ... other attributes
            .send()
            .await?;
        
        Ok(())
    }
}
GDPR Compliance: EU data never crosses region boundaries. Audited by third-party security firm. Why it works: DynamoDB’s regional isolation + application-level routing = compliance by design.

When It Didn’t Work (The Problems)

The AI audit revealed four critical issues that cost real money and engineering time. Here’s what went wrong:

Problem 1: 217 Direct AWS SDK Calls

Discovery: AI code analysis found 217 direct aws-sdk-dynamodb calls bypassing the repository layer. Risk: No tenant isolation, no audit logging, no PII encryption, no retry logic. Example of the problem:
// integration/src/sync/salesforce_sync.rs:342
// ❌ WRONG: Direct SDK call bypassing safety mechanisms

use aws_sdk_dynamodb::Client;

async fn sync_lead_from_salesforce(&self, lead_data: SalesforceResponse) -> Result<()> {
    // Direct SDK call - bypasses tenant isolation!
    self.dynamo_client
        .put_item()
        .table_name("crm-domain")
        .item("PK", AttributeValue::S(format!("TENANT#{}", self.tenant_id)))
        .item("SK", AttributeValue::S(format!("LEAD#{}", lead_data.id)))
        .item("name", AttributeValue::S(lead_data.name))
        .item("email", AttributeValue::S(lead_data.email))  // ❌ PII not encrypted!
        .send()
        .await?;
    
    Ok(())
}
Why this is dangerous:
  1. No tenant isolation: Typo in tenant_id → cross-tenant data leak
  2. No PII encryption: Email stored in plaintext → GDPR violation
  3. No audit logging: No trace of who/when/why data changed
  4. No retry logic: Throttling errors fail immediately
The fix:
// ✅ CORRECT: Use repository layer

async fn sync_lead_from_salesforce(&self, lead_data: SalesforceResponse) -> Result<()> {
    let lead = Lead {
        id: LeadId::new(),
        tenant_id: self.tenant_id.clone(),
        name: lead_data.name,
        email: lead_data.email,  // ✅ Repository encrypts PII automatically
        // ...
    };
    
    // Repository handles:
    // - Tenant isolation (validated)
    // - PII encryption (automatic)
    // - Audit logging (event sourced)
    // - Retries (exponential backoff)
    self.lead_repository.save(&lead).await?;
    
    Ok(())
}
Cost to fix: 217 violations × 20-30 minutes per fix = 80-120 hours (16,00016,000-24,000 in engineering time at $200/hour). Issue tracked: GitHub Issue #496 - “AWS SDK usage bypassing repository layer” AI Role: The Evaluator agent scanned 23,820 lines of DynamoDB code and identified every violation with file/line references. Human review confirmed 217 legitimate issues.

Problem 2: Scan Operations Instead of Query

Discovery: 7 repositories using scan() instead of query() for list operations. Cost: $200-500/month in wasted Read Capacity Units (RCU). Example of the problem:
// catalog/src/repository/product_repository.rs:156
// ❌ WRONG: Scanning entire table

pub async fn list_products_by_category(
    &self, 
    tenant_id: &str, 
    category_id: &str
) -> Result<Vec<Product>> {
    let result = self.client
        .scan()  // ❌ Reads EVERY item in table
        .table_name(&self.table_name)
        .filter_expression("tenant_id = :tenant AND category_id = :category")
        .expression_attribute_values(":tenant", AttributeValue::S(tenant_id.to_string()))
        .expression_attribute_values(":category", AttributeValue::S(category_id.to_string()))
        .send()
        .await?;
    
    // Scan reads 100,000 items, filters to 50 matches
    // Cost: 100,000 RCU
    Ok(parse_items(result.items))
}
Why this is expensive:
  • Table has 100,000 products across all tenants
  • Scan reads all 100,000 items (costs 100,000 RCU)
  • Filter expression runs AFTER reading (no cost savings)
  • Query should use GSI to read only 50 items (50 RCU)
The fix:
// ✅ CORRECT: Query with GSI

#[derive(DynamoRepository)]
#[dynamo(
    table = "catalog-domain",
    pk = "TENANT#{tenant_id}",
    sk = "PRODUCT#{id}",
    gsi1_pk = "TENANT#{tenant_id}#CATEGORY#{category_id}",  // ✅ Add GSI
    gsi1_sk = "PRODUCT#{id}"
)]
pub struct Product { /* ... */ }

pub async fn list_products_by_category(
    &self, 
    tenant_id: &str, 
    category_id: &str
) -> Result<Vec<Product>> {
    let result = self.client
        .query()  // ✅ Reads only matching items
        .table_name(&self.table_name)
        .index_name("GSI1")
        .key_condition_expression("GSI1PK = :pk")
        .expression_attribute_values(
            ":pk", 
            AttributeValue::S(format!("TENANT#{}#CATEGORY#{}", tenant_id, category_id))
        )
        .send()
        .await?;
    
    // Query reads 50 items directly
    // Cost: 50 RCU (2000x improvement)
    Ok(parse_items(result.items))
}
Impact:
  • Before: 100,000 RCU per query × 5,000 queries/month = 500M RCU/month = $500/month
  • After: 50 RCU per query × 5,000 queries/month = 250K RCU/month = $0.25/month
Savings: 500/month500/month → **210/month saved** after fixing 7 repositories. AI Role: The Evaluator agent found scan operations and suggested GSI-based alternatives.

Problem 3: GSI Proliferation

Discovery: Product entity requires 4 GSIs to support basic queries. Cost: Each GSI doubles storage costs and adds 8-12 hours of development time per access pattern. The problem:
// catalog/src/domain/product.rs

#[derive(DynamoRepository)]
#[dynamo(
    table = "catalog-domain",
    pk = "TENANT#{tenant_id}",
    sk = "PRODUCT#{id}",
    
    // GSI1: Query by category
    gsi1_pk = "TENANT#{tenant_id}#CATEGORY#{category_id}",
    gsi1_sk = "PRODUCT#{id}",
    
    // GSI2: Query by brand
    gsi2_pk = "TENANT#{tenant_id}#BRAND#{brand_id}",
    gsi2_sk = "PRODUCT#{id}",
    
    // GSI3: Query by price range
    gsi3_pk = "TENANT#{tenant_id}#PRICETIER#{price_tier}",
    gsi3_sk = "PRICE#{price}",
    
    // GSI4: Query by SKU
    gsi4_pk = "TENANT#{tenant_id}#SKU#{sku}",
    gsi4_sk = "PRODUCT#{id}"
)]
pub struct Product {
    pub id: ProductId,
    pub tenant_id: String,
    pub category_id: String,
    pub brand_id: String,
    pub sku: String,
    pub price: Decimal,
    pub price_tier: PriceTier,  // Computed: LOW, MEDIUM, HIGH
}
Cost breakdown:
  • Storage: 4 GSIs = 5× base table size (base + 4 GSIs) = 5× storage cost
  • Development: Each GSI requires:
    • Schema design: 2-3 hours
    • Implementation: 3-4 hours
    • Testing: 2-3 hours
    • Total: 8-12 hours per GSI
When this happens:
  • Product catalog: 4 GSIs
  • Order management: 5 GSIs (status, customer, date range, fulfillment center, tracking number)
  • Invoice system: 3 GSIs (customer, date range, payment status)
Cumulative cost: 12 GSIs × 10 hours = 120 hours ($24,000 in engineering time). The alternative: For products with unpredictable access patterns, OpenSearch or ElasticSearch is more cost-effective:
// catalog/src/repository/product_search_repository.rs

// ✅ Use OpenSearch for flexible queries

pub async fn search_products(&self, query: ProductQuery) -> Result<Vec<Product>> {
    let mut search = SearchQuery::new();
    
    // Tenant isolation (mandatory)
    search.filter("tenant_id", query.tenant_id);
    
    // Flexible filters (no GSI needed)
    if let Some(category) = query.category {
        search.filter("category_id", category);
    }
    if let Some(brand) = query.brand {
        search.filter("brand_id", brand);
    }
    if let Some((min, max)) = query.price_range {
        search.range("price", min, max);
    }
    if let Some(sku) = query.sku {
        search.match_field("sku", sku);
    }
    
    self.opensearch_client.search(search).await
}
Cost comparison:
  • DynamoDB with 4 GSIs: 180/month(storage)+180/month (storage) + 24K (dev time)
  • OpenSearch t3.small: 45/month+20hoursdevtime(45/month + 20 hours dev time (4K one-time)
Decision rule:
  • DynamoDB: Known access patterns (2-3 GSIs max), transactional workloads
  • OpenSearch: Flexible search, text queries, >3 access patterns

Problem 4: Time-Series Data in DynamoDB

Discovery: Analytics metrics stored in DynamoDB costing 255/month.Timestreamwouldcost255/month. Timestream would cost 45/month. Cost: 5-6× more expensive than purpose-built time-series database. The problem:
// analytics/src/repository/metrics_repository.rs

#[derive(DynamoRepository)]
#[dynamo(
    table = "analytics-domain",
    pk = "TENANT#{tenant_id}#METRIC#{metric_name}",
    sk = "TIMESTAMP#{timestamp}",
    ttl = "expiration_time"  // Auto-delete after 90 days
)]
pub struct MetricDataPoint {
    pub tenant_id: String,
    pub metric_name: String,
    pub timestamp: DateTime<Utc>,
    pub value: f64,
    pub dimensions: HashMap<String, String>,
    pub expiration_time: u64,  // TTL for automatic deletion
}

// Query: Get hourly aggregates for last 30 days
pub async fn get_hourly_metrics(
    &self,
    tenant_id: &str,
    metric_name: &str,
    start: DateTime<Utc>,
    end: DateTime<Utc>
) -> Result<Vec<AggregatedMetric>> {
    // 1. Query raw data points (30 days × 24 hours × 60 minutes = 43,200 points)
    let raw_points = self.client
        .query()
        .table_name(&self.table_name)
        .key_condition_expression("PK = :pk AND SK BETWEEN :start AND :end")
        .expression_attribute_values(
            ":pk",
            AttributeValue::S(format!("TENANT#{}#METRIC#{}", tenant_id, metric_name))
        )
        .send()
        .await?;
    
    // 2. Aggregate in Lambda (expensive compute)
    let aggregated = self.aggregate_by_hour(raw_points)?;
    
    Ok(aggregated)
}
Cost breakdown:
  • Storage: 43,200 points × 1KB = 43MB per tenant per metric
    • 100 tenants × 10 metrics = 43GB = $10.75/month
  • RCU: 43,200 reads × 5,000 queries/month = 216M RCU = $216/month
  • Lambda compute: Aggregation logic = $28/month
  • Total: $255/month
The fix: Use Timestream
// analytics/src/repository/timestream_metrics_repository.rs

pub struct TimestreamMetricsRepository {
    client: TimestreamQueryClient,
    database: String,
    table: String,
}

impl TimestreamMetricsRepository {
    pub async fn get_hourly_metrics(
        &self,
        tenant_id: &str,
        metric_name: &str,
        start: DateTime<Utc>,
        end: DateTime<Utc>
    ) -> Result<Vec<AggregatedMetric>> {
        let query = format!(
            "SELECT 
                bin(time, 1h) as hour,
                avg(measure_value::double) as avg_value,
                max(measure_value::double) as max_value,
                min(measure_value::double) as min_value
            FROM \"{}\".\"{}\"
            WHERE tenant_id = '{}'
              AND measure_name = '{}'
              AND time BETWEEN '{}' AND '{}'
            GROUP BY bin(time, 1h)
            ORDER BY hour",
            self.database, self.table, tenant_id, metric_name,
            start.to_rfc3339(), end.to_rfc3339()
        );
        
        // Timestream does aggregation natively (no Lambda needed)
        let result = self.client.query().query_string(query).send().await?;
        
        Ok(parse_timestream_result(result))
    }
}
Cost comparison:
  • DynamoDB: $255/month (storage + RCU + Lambda compute)
  • Timestream: $45/month (storage + query pricing)
  • Savings: $210/month (5.6× cheaper)
Additional benefits:
  • Native time-series aggregation (no custom Lambda)
  • Automatic downsampling (1-minute → hourly → daily)
  • Magnetic storage for old data (10× cheaper)
  • SQL queries (easier to maintain)
Decision rule:
  • DynamoDB: Transactional data with occasional time-based queries
  • Timestream: Time-series data requiring aggregation/downsampling

The Cost Analysis (Real Numbers)

Let’s quantify the technical debt discovered by the AI audit:
Engineering Time:
  • 217 SDK violations × 30 min = 108 hours
  • 7 scan operations → GSI migration = 12 hours
  • Total: 120 hours = 24,000at24,000 at 200/hour
Monthly Recurring Costs:
  • Scan operations: $210/month wasted RCU
  • Time-series in DynamoDB: $210/month vs Timestream
  • Total: 420/month=420/month = 5,040/year
Combined: $29,040 in first year (debt + ongoing costs)
The lesson: Architectural mistakes compound. The AI audit caught these issues before they became unfixable.

Decision Framework: When to Use Single-Table

After 12 months and 56 entity types, here’s the practical decision framework:

✅ Use Single-Table When:

1. Within a Bounded Context Single-table works when entities share a domain boundary. Good: CRM domain (Leads, Accounts, Opportunities, Contacts)
  • Shared tenant isolation
  • Hierarchical relationships (Account → Opportunities → Line Items)
  • Consistent access patterns (query by tenant, owner, date)
Bad: Entire application (CRM + Billing + Analytics + Integrations)
  • Different access patterns
  • Different scaling characteristics
  • Different cost profiles
2. Hierarchical Data Single-table excels at parent→child relationships.
// One query, no joins
PK: TENANT#abc123
SK: ACCOUNT#123                  // Parent
SK: ACCOUNT#123#OPPORTUNITY#456  // Child
SK: ACCOUNT#123#OPPORTUNITY#456#LINE_ITEM#789  // Grandchild

// Query: Get account + all opportunities + all line items
query(
    PK = "TENANT#abc123",
    SK begins_with "ACCOUNT#123"
)
3. Tenant Isolation Required Single-table guarantees zero cross-tenant leaks via partition keys. 4. Known Access Patterns Upfront GSIs work when you know queries in advance. If access patterns change monthly, use OpenSearch instead.

❌ Use Separate Tables (or Services) When:

1. Cross-Domain Analytics Querying across domains (CRM + Billing + Usage) requires joins. DynamoDB doesn’t do joins. Solution: Data warehouse (Snowflake, Redshift) or OLAP database. 2. Time-Series Data DynamoDB charges for storage + reads. Time-series databases optimize for compression + aggregation. Solution: Timestream (5-6× cheaper), InfluxDB, or TimescaleDB. 3. Unpredictable Access Patterns If you add a GSI every month, storage costs multiply 5-10×. Solution: OpenSearch (flexible queries, no GSI needed). 4. Different Retention Policies Audit logs (7 years) vs session data (24 hours) shouldn’t share a table. Solution: Separate tables with different TTL/backup policies.

The Rule of Thumb

Single-table per bounded context, not per application. My platform:
  • 19 tables aligned with 19 bounded contexts
  • Each table uses single-table design within that domain
  • Tables don’t share data (loose coupling)
Example: CRM domain has one table with 8 entity types (Lead, Account, Opportunity, Contact, Task, Note, Activity, Attachment).

Key Learnings

After 12 months, 56 entity types, and 23,820 lines of DynamoDB code:

1. Single-Table ≠ One Table for Entire Application

Wrong: One table for CRM + Billing + Analytics + Platform. Right: One table per bounded context. 19 contexts = 19 tables. Why: Different domains have different access patterns, scaling needs, and cost profiles.

2. DynamoDB Excels at Transactional Workloads, Not Analytics

Good use cases:
  • CRUD operations (Create, Read, Update, Delete)
  • Hierarchical queries (Tenant → Account → Opportunity)
  • Tenant isolation (partition key = TENANT#)
  • Known access patterns (2-3 GSIs)
Bad use cases:
  • Cross-domain joins (CRM + Billing)
  • Time-series aggregation (use Timestream)
  • Flexible search (use OpenSearch)
  • Data warehouse queries (use Snowflake)

3. Managed Services Often Beat Custom Solutions

Lesson: Don’t build what you can buy.
  • Time-series in DynamoDB: $255/month + 40 hours custom aggregation
  • Timestream: $45/month + 8 hours integration
  • Savings: $210/month + 32 hours
Other examples:
  • Full-text search: OpenSearch (45/month)vsDynamoDBGSIs(45/month) vs DynamoDB GSIs (180/month + 100 hours)
  • Audit logs: S3 + Athena (5/month)vsDynamoDB(5/month) vs DynamoDB (90/month)

4. AI Can Audit Architectural Violations at Scale

Manual audit: 23,820 lines = 120 hours for senior engineer to review. AI audit: 15 minutes to scan entire codebase, identify 217 SDK violations, suggest fixes. AI role:
  • Pattern detection (scan vs query, GSI opportunities)
  • Cost estimation ($200-500/month in wasted RCU)
  • Prioritization (fix SDK violations first, migrate time-series second)
Human role:
  • Validate AI findings (false positives)
  • Make architectural decisions (DynamoDB vs Timestream)
  • Review generated code (macro expansions)

5. Technical Debt Compounds Without Monitoring

Year 1: 217 SDK violations = $24K to fix. Year 2: If unchecked, violations double (434) = $48K to fix. Prevention:
  • Pre-commit hooks (block direct SDK calls)
  • Integration tests (detect scan operations)
  • Cost alerts (flag expensive queries)
  • Quarterly audits (AI-powered reviews)

Conclusion

The single-table DynamoDB thesis didn’t fail—it evolved. What I learned:
  1. Single-table works WITHIN bounded contexts, not across entire applications
  2. 19 tables aligned with 19 domains = clean architecture
  3. DynamoDB excels at transactional workloads, not analytics
  4. Managed services (Timestream, OpenSearch) often beat custom solutions
  5. AI audits catch architectural violations humans miss
What I’d do differently:
  • Start with domain boundaries first, then choose databases
  • Use Timestream for time-series from day 1 (save $210/month)
  • Implement pre-commit hooks to block SDK bypasses (prevent $24K debt)
  • Set cost alerts for scan operations (catch $500/month waste early)
Final thought: The platform runs at less than 100ms latency, zero cross-tenant leaks, and $400/month DynamoDB costs for 100 tenants. That’s success. But the real lesson? Architecture isn’t about following best practices—it’s about knowing when to break them. Single-table DynamoDB is elegant. Within the right context.

Resources


Further reading: The Macro That Wrote 80% of Our Repositories — How Rust macros automated the repository layer boilerplate that single-table design demands.
Have questions about single-table design or DynamoDB architecture? I’m @sreedhar on Twitter.