The Problem: Discovered in Production
During testing week 3: Event logs showed a warning.The Scope: 6 Entities to Migrate
Analysis revealed the problem was widespread:| Entity | Scope | Issue | Risk |
|---|---|---|---|
| FinancialConfig | Tenant | Test configs in prod queries | High |
| Contract | Tenant | Test contracts in revenue reports | Critical |
| ContractLineItem | Tenant | Test line items in billing | Critical |
| ContractAmendment | Tenant | Test amendments in audit trail | High |
| RevenueSchedule | Tenant | Test revenue in financial reporting | Critical |
| AccessEntity | Tenant | Test access grants in security queries | Medium |
- 6 entities
- 21 repository methods per entity
- 47 API handlers
- 1,003 tests
- Estimated migration effort: 4-6 weeks manually
The Decision: Systematic vs. Incremental
Option A: Incremental Migration (Rejected)- Migrate one entity at a time
- Deploy after each entity
- Gradual rollout over 6 weeks
- Mixed isolation states during migration
- Complex conditional queries (is this entity capsule-scoped yet?)
- 6 deployment cycles, 6 risk windows
- Plan all 6 entities comprehensively
- Implement in parallel
- Single coordinated deployment
- One-time breaking change
The Planning Session
Evaluator session (Opus, 6 hours):Key Architecture Decisions from Migration Plan
Key Architecture Decisions from Migration Plan
Decision 1: Dual-Write Migration Strategy
Phase 1: Add capsule_id, maintain old PK pattern- Add capsule_id field to entities
- Still use old PK:
TENANT#{tenant_id}#CONTRACT#{id} - Dual-write: Write to both old and new patterns
- Queries use old pattern (no behavior change)
- Start querying new PK:
TENANT#{tenant_id}#CAPSULE#{capsule_id}#CONTRACT#{id} - Still dual-write to both patterns
- Monitor for issues
- Stop writing to old pattern
- Clean up old data
- Remove dual-write code
Decision 2: GSI Pattern Update
Old pattern:Decision 3: Test Data Migration
Challenge: 1,003 tests use hard-coded tenant_id, no capsule_id.Options:- Update all tests to include capsule_id (manual)
- Create default capsule for tests (automated)
- Generate migration script for test data
- Test helper:
test_capsule()returns default CapsuleId for all tests - Critical tests (cross-capsule scenarios): Explicit capsule_id values
The Implementation: Parallel Migration
Instead of migrating entities sequentially, we migrated in parallel using AI:Entity #1: FinancialConfig (Template Migration)
Builder session:- Entity migrated ✅
- 21 tests passing ✅
- API handlers updated ✅
Entities #2-6: Parallel Migration
With the pattern established from FinancialConfig, I launched 5 parallel Builder sessions (one per entity): Contract entities (Session 1):- ContractEntity
- ContractLineItemEntity
- ContractAmendmentEntity
- RevenueScheduleEntryEntity
- AccessEntity
- FinancialConfig first (establishes pattern)
- Contract entities second (largest change)
- Access entities last (smallest change)
- Resolve merge conflicts in shared files
The Cascade We Prevented
What could have gone wrong without systematic planning:Scenario: Ad-Hoc Migration
Day 1: Developer migrates FinancialConfig- Adds capsule_id field
- Updates PK
- Forgets to update GSI
- Merges
- Follows FinancialConfig pattern
- Copies the broken GSI pattern
- Merges
- GSI queries bypass tenant isolation
- All 6 entities have the same bug
- Must fix all entities again
What Actually Happened: Planned Migration
Day 1: Evaluator creates comprehensive plan- Identifies GSI pattern issue upfront
- Documents correct pattern for ALL entities
- Creates migration checklist
- First entity (FinancialConfig) validates pattern
- Remaining 5 entities copy validated pattern
- All entities migrated correctly first time
- Run full test suite (1,003 tests)
- All passing
- Security audit confirms isolation
What AI Excelled At
1. Systematic Call Site Updates
The task: Update 127 call sites to pass capsule_id parameter.- Find all calls to
gsi1pk_for_account(and 6 similar functions) - Update each call to include tenant_id
- Verify tenant_id is available in scope (add parameter if needed)
- Compile and check
2. Test Data Updates
The task: Update 1,003 tests to include capsule_id. AI’s approach:What AI Struggled With
1. Understanding Migration Order
The issue: Contract entities have foreign key relationships:- ContractEntity (parent)
- ContractLineItemEntity (child)
- ContractAmendmentEntity + RevenueScheduleEntryEntity (grandchildren)
2. Data Migration Strategy
The task: Migrate existing DynamoDB data from old PK pattern to new. AI’s plan:Principles Established
Principle 1: Plan Breaking Changes Comprehensively
Principle 1: Plan Breaking Changes Comprehensively
What we learned: Breaking changes affecting multiple entities need coordinated planning.Checklist for breaking changes:
- List ALL affected entities
- Identify all call sites (use rg/grep)
- Define migration order (dependency graph)
- Create rollback plan
- Plan data migration strategy
- Update all affected tests
- Document breaking changes
Principle 2: Template Entity Validates Pattern
Principle 2: Template Entity Validates Pattern
What we learned: Migrating one entity first (FinancialConfig) caught pattern issues before they
propagated to other entities.Practice:
- Choose simplest entity as template
- Migrate template entity completely
- Verify thoroughly (Verifier catches pattern issues)
- Fix pattern issues in template
- Use validated template for remaining entities
Principle 3: Parallel Migration Requires Merge Discipline
Principle 3: Parallel Migration Requires Merge Discipline
What we learned: 5 parallel Builder sessions created merge conflicts in shared files.Strategy:
- Identify shared files upfront (error types, API common)
- Define merge order (simplest → most complex)
- First entity establishes patterns in shared files
- Subsequent entities adapt to established patterns
Principle 4: Test Migration Is Part of Feature Migration
Principle 4: Test Migration Is Part of Feature Migration
What we learned: Updated 1,003 tests as part of entity migration, not afterward.Why: Tests validate the migration worked correctly.Practice:
- Migrate entity domain model
- Update repositories
- Update tests (verify new behavior)
- Update API handlers
- Integration tests confirm end-to-end
Principle 5: Breaking Changes Document Migration Path
Principle 5: Breaking Changes Document Migration Path
What we learned: Future developers need to understand why entities are capsule-scoped.Documentation created:
- ADR-0010: Capsule Isolation Enforcement
- Migration plan for each entity (preserved in .plans/)
- Updated CLAUDE.md with capsule isolation patterns
- Context: Why was this change needed?
- Decision: What pattern did we choose?
- Consequences: What broke? How did we fix it?
- Lessons: What would we do differently?
The Migration Process
Step 1: Entity Schema Update
Before:- Added
#[capsule_isolated]attribute - Added
capsule_idfield - Updated PK pattern to include
CAPSULE#{capsule_id}
Step 2: Repository Method Updates
Before:Step 3: Call Site Updates
The systematic work: 21 call sites for FinancialConfig alone. Before:Step 4: Test Updates
1,003 tests × 6 entities: Most were straightforward. Before:What Went Right
1. The Template Entity Approach
FinancialConfig as the template:- Simplest entity (single config per tenant/capsule)
- No foreign keys to other entities
- Fewest call sites (21 vs. 40+ for Contract)
- Caught GSI pattern issue early
- Established test update pattern
- Created reusable helper functions (test_capsule, extract_capsule_from_context)
2. Comprehensive Verification
After each entity migration, Verifier checked:-
Schema compliance:
- PK includes both TENANT# and CAPSULE#
- GSI patterns consistent across entities
- capsule_id field present and required
-
Isolation verification:
- Tests include cross-capsule negative tests (DEVUS data shouldn’t appear in PRODUS queries)
- Repository methods enforce capsule parameter
- API handlers extract capsule from request context
-
Data migration:
- Old data accessible during migration
- New writes use new pattern
- No data loss
- Missing TENANT# in GSI patterns (2 entities)
- Cross-capsule test missing (3 entities)
- Incomplete API handler updates (1 entity)
Metrics: The Migration by Numbers
- Scope
- Time
- Quality
- Cost
Entities migrated: 6Files modified: 47Lines changed:
- Added: 1,247 lines
- Removed: 721 lines
- Net: +526 lines (additional isolation code)
The Mistake I Made
After Contract migration completed: I was confident in the pattern. I skipped detailed verification for the last 2 entities (AccessEntity and one other). Impact: Deployed to staging environment. What broke: Access control queries returned inconsistent results. Root cause: AccessEntity migration updated PK correctly but forgot to update the query method:Code Example: The Migration Pattern
Here’s the final pattern we established for capsule isolation migration:Takeaways
For large-scale breaking changes:- Plan comprehensively - List ALL affected entities upfront
- Migrate template first - Validate pattern with simplest entity
- Parallel implementation - Speed up with multiple AI sessions
- Verify every entity - Don’t skip verification, even for “obvious” migrations
- Document patterns - Future migrations reuse validated patterns
- AI excels: Systematic call site updates, test data migrations
- AI struggles: Dependency ordering, concurrency reasoning, migration strategies
- Human needed: Define migration order, review data migration plan, coordinate parallel work