Technical Debt Triage: What to Fix, What to Live With
The code was incomplete. Thelist_by_tenant() method just said unimplemented!() - it would panic in production. But we shipped it anyway. Six months later, it’s still there. And it was the right decision.
Here’s why.
The Problem with Technical Debt Advice
Most technical debt advice falls into two camps: “Never ship with known issues” or “Move fast and break things.” Neither works in practice. The real question isn’t whether to take on debt - it’s which debt to pay down, which to live with, and how to tell the difference. Over six months building a multi-tenant SaaS platform, we accumulated technical debt. Some we fixed immediately. Some we deferred indefinitely. And some we wish we’d fixed sooner. This article breaks down those decisions with real ROI calculations and decision frameworks you can use.Debt Paid Down: Four Success Stories
Story 1: AWS Runtime Migration
The Debt: Each crate managed AWS clients independently. Six different implementations, all slightly different. Some cached connections, some didn’t. Some handled retries, some failed fast. Cost to Fix: 16 hours Benefits:- Eliminated 200+ lines of duplicated code
- Prevented 3 potential isolation bugs (wrong client configuration)
- Reduced onboarding confusion (one pattern instead of six)
- Time saved on future changes: ~2 hours/month (consistent patterns)
- Bug prevention value: ~4 hours/incident × 3 incidents = 12 hours
- Payback period: 8 months
- Verdict: Positive ROI
Story 2: LegalEntityId Consolidation
The Debt: Two definitions ofLegalEntityId existed - one using UUID v4, one using UUID v7. Different files, different semantics, same name.
Cost to Fix: 30 minutes
Benefits:
- Single source of truth
- Prevented future confusion about which to use
- Made code review conversations simpler
- Prevention value: ~2 hours/confusion incident
- Expected frequency: 1-2 times per quarter
- Verdict: Immediate positive ROI
Story 3: Opportunity Pipeline Refactor
The Debt: Pipeline stages were hardcoded as an enum. No tenant could customize their sales process. Enterprise customers need custom pipelines. Cost to Fix: 12 hours (breaking change, database migration) Benefits:- Enterprise-ready feature unlocked
- Enables customer customization
- Direct revenue impact
- Customer value: Enables enterprise sales
- Alternative cost: Build workaround layer (20+ hours)
- Verdict: Customer value justifies cost
Story 4: TODO Comment Cleanup
The Debt: Seven TODO comments suggested using the wrong event pattern for domain events. Copy-paste had spread bad architectural guidance. Cost to Fix: 1 hour (find and update comments) Benefits:- Prevents future bugs from wrong architecture
- Clarifies intended patterns
- Saves future code review time
- Prevention value: 4 hours per misdirected implementation
- Expected frequency: 2-3 times per year
- Verdict: Immediate positive ROI
Debt Lived With: Three Success Stories
Story 1: Cross-Capsule Queries - The $20K Decision
The Debt: Thelist_by_tenant() method remains unimplemented!(). It panics if called. It’s been six months. It’s still there.
Use Case: Operator tooling needs to list all items across capsules (pods) for a tenant. Happens fewer than 10 times per month.
Current Workaround: Manually iterate through capsules (5 minutes/month).
Implementation Cost:
- Global query infrastructure: 60 hours
- Cross-capsule coordination: 30 hours
- Testing and edge cases: 10 hours
- Total: 100 hours at 20,000
- Time saved: 5 minutes/month = 1 hour/year
- Cost to maintain: 2 hours/year
- Net savings: $19,800/year
Story 2: UX Demo Applications
The Debt: TypeScript type mismatches in demo apps. Several component demos incomplete or outdated. Alternative: Storybook already provides interactive component demos with proper types. Fix Cost: 4 hours debugging + ongoing maintenance burden Decision: Deferred indefinitely. Storybook is sufficient for component exploration. ROI Calculation:- Value added: Minimal (Storybook covers use case)
- Maintenance cost: 2-3 hours/month
- Net savings: $800/year
Story 3: LocalStack Integration Tests
The Debt: Many integration tests marked#[ignore] - they won’t run locally without infrastructure setup.
Blocker: Infrastructure setup scripts incomplete.
Workaround: CI environment runs tests with proper LocalStack setup.
Decision: Defer until blocker resolved. Tests run where they matter (CI).
Why It Worked: Tests still run in gating environment. Local convenience isn’t worth unblocking effort yet.
Debt Lived With: One Failure Story
Capsule Isolation Violations
The Debt: Four entities stayed tenant-scoped when they should have been capsule-scoped. In a multi-capsule (pod) architecture, this means dev/test data could mix with production data. Impact: HIGH RISK - architectural integrity violation, potential data leakage. Discovery: Architectural audit found 7% violation rate (4 out of ~60 entities). Original Fix Cost (if done immediately): 4 hours per entity = 16 hours total Current Fix Cost (after 6 months): 12-18 hours per entity = 48-72 hours total Why It Got Harder:- More code depends on wrong scoping
- Migration complexity increased
- Test data assumptions baked in
Three Decision Frameworks
Framework 1: Issue Backpressure (Priority Labels)
This framework categorizes work by urgency and impact:- Capsule isolation violations → CRITICAL (architectural risk)
- Pipeline customization → HIGH (blocks enterprise sales)
- AWS runtime consolidation → MEDIUM (important, not urgent)
- Cross-capsule queries → LOW (workaround exists)
- LocalStack tests → BLOCKED (waiting on #441)
Framework 2: Cost/Benefit Matrix
Map frequency against value to determine action:Framework 3: Stage-Based Complexity (Lean Until Proven)
Don’t build Stage 4 infrastructure in Stage 0:- Cross-capsule queries (Stage 4) → Defer in Stage 0 ✓
- Configuration governance (Stage 2) → Transitional approach OK ✓
- Soft-delete (Stage 3) → Defer to backlog until scaling needs ✓
- Pipeline customization (Stage 2) → Build for design partners ✓
When Managed Services Win
Sometimes the best way to avoid technical debt is to not write the code at all.Case Study: Timestream vs Custom DynamoDB
Requirement: Store and query time-series metrics data. Option A: Custom DynamoDB Implementation- Development: 100 hours ($20,000)
- Operations: 10 hours/month ($2,000/year)
- Storage: $280-450/month
- Query performance: 2-8 seconds
- Total Year 1: $48,584
- Development: 0 hours
- Operations: 0 hours/month
- Storage: $89/month
- Query performance: less than 500ms
- Total Year 1: $1,068
Transitional Debt: How to Take On Debt Safely
Not all debt is created equal. Some debt is explicitly temporary - planned, tracked, and bounded.Configuration Governance Example
The Requirement: Implement configuration governance with encryption, field-level validation, and migration capabilities. Option A: Full Implementation- Cost: 5 days
- Risk: Over-engineering for current needs
- Phase 1: Basic interface (1 day)
- Phase 2: Value types (1 day)
- Phase 3: Migration tools (1 day)
- Phase 4: Encryption (1 day)
- Total: 4 days, delivered incrementally
Requirements for Transitional Debt
Every transitional debt decision must have:The Documentation Checklist
Good debt management requires documentation. Every debt decision needs:-
ADR or Plan Document
- Explains WHY the decision was made
- Captures alternatives considered
- Documents decision criteria
-
GitHub Issue
- Tracks WHAT needs to be done (if anything)
- Links to ADR for context
-
Labels
technical-debt- Marks known debtblocked- Waiting on dependencydeferred- Intentionally postponedwont-fix- Explicitly accepting debt
-
Migration Path or Acceptance Criteria
- How to fix it (if we decide to)
- What success looks like
-
Decision Point
- When to revisit (timeline, milestone, or condition)
- Example: “Revisit when >100 queries/month”
The Key Insight
Quote from our project constitution:“Good debt management is not avoiding all debt - it’s making conscious decisions, documenting them clearly, and having criteria for when to revisit.”The difference between strategic debt and technical bankruptcy is documentation and decision-making.
Outcomes Summary
Debt Paid Down:- AWS runtime migration: 16 hours invested, positive ROI in 8 months
- LegalEntityId consolidation: 30 minutes, immediate ROI
- Pipeline refactor: 12 hours, unlocked enterprise sales
- TODO cleanup: 1 hour, prevented architectural confusion
- Cross-capsule queries: $19,800/year savings vs building it
- Demo apps: $800/year savings by accepting Storybook
- LocalStack tests: Deferred until blocker resolved
- Capsule isolation violations: 4x harder to fix after 6 months
- Lesson: Architectural debt compounds
- Configuration governance: Tracked explicitly across 4 phases
- Status: Manageable and progressing
Applying These Frameworks
When you encounter technical debt, run through these questions:- Priority (Backpressure): Is this CRITICAL, HIGH, MEDIUM, LOW, or BLOCKED?
- Frequency: How often is this code executed or touched?
- Value: Does this impact revenue, users, security, or architecture?
- Cost: How many hours to fix? What’s the risk of waiting?
- Stage: Does this complexity match our current stage?
- Alternative: Is there a managed service or workaround?
- CRITICAL or security: Fix immediately, no questions
- High frequency + high value: Build now
- Low frequency + high value: Workaround acceptable
- High frequency + low value: Build only if cheap
- Low frequency + low value: Defer or won’t-fix
The unimplemented function saved us $20,000. The architectural violation cost us 3x more to fix later. The difference wasn’t the debt itself - it was knowing which debt to take on, and which to pay down immediately. Choose wisely.