Technical Debt Triage: What to Fix, What to Live With

The code was incomplete. The list_by_tenant() method just said unimplemented!() - it would panic in production. But we shipped it anyway. Six months later, it’s still there. And it was the right decision. Here’s why.

The Problem with Technical Debt Advice

Most technical debt advice falls into two camps: “Never ship with known issues” or “Move fast and break things.” Neither works in practice. The real question isn’t whether to take on debt - it’s which debt to pay down, which to live with, and how to tell the difference. Over six months building a multi-tenant SaaS platform, we accumulated technical debt. Some we fixed immediately. Some we deferred indefinitely. And some we wish we’d fixed sooner. This article breaks down those decisions with real ROI calculations and decision frameworks you can use.

Debt Paid Down: Four Success Stories

Story 1: AWS Runtime Migration

The Debt: Each crate managed AWS clients independently. Six different implementations, all slightly different. Some cached connections, some didn’t. Some handled retries, some failed fast. Cost to Fix: 16 hours Benefits:

Eliminated 200+ lines of duplicated code
Prevented 3 potential isolation bugs (wrong client configuration)
Reduced onboarding confusion (one pattern instead of six)

ROI Calculation:

Time saved on future changes: ~2 hours/month (consistent patterns)
Bug prevention value: ~4 hours/incident × 3 incidents = 12 hours
Payback period: 8 months
Verdict: Positive ROI

Why It Worked: High-frequency code (used in every service), clear architectural benefit, reasonable fix cost.

Story 2: LegalEntityId Consolidation

The Debt: Two definitions of LegalEntityId existed - one using UUID v4, one using UUID v7. Different files, different semantics, same name. Cost to Fix: 30 minutes Benefits:

Single source of truth
Prevented future confusion about which to use
Made code review conversations simpler

ROI Calculation:

Prevention value: ~2 hours/confusion incident
Expected frequency: 1-2 times per quarter
Verdict: Immediate positive ROI

Why It Worked: Trivial fix cost, clear architectural win, prevents compound confusion.

Story 3: Opportunity Pipeline Refactor

The Debt: Pipeline stages were hardcoded as an enum. No tenant could customize their sales process. Enterprise customers need custom pipelines. Cost to Fix: 12 hours (breaking change, database migration) Benefits:

Enterprise-ready feature unlocked
Enables customer customization
Direct revenue impact

ROI Calculation:

Customer value: Enables enterprise sales
Alternative cost: Build workaround layer (20+ hours)
Verdict: Customer value justifies cost

Why It Worked: Directly tied to revenue, blocking feature for enterprise tier, fix cheaper than workaround.

Story 4: TODO Comment Cleanup

The Debt: Seven TODO comments suggested using the wrong event pattern for domain events. Copy-paste had spread bad architectural guidance. Cost to Fix: 1 hour (find and update comments) Benefits:

Prevents future bugs from wrong architecture
Clarifies intended patterns
Saves future code review time

ROI Calculation:

Prevention value: 4 hours per misdirected implementation
Expected frequency: 2-3 times per year
Verdict: Immediate positive ROI

Why It Worked: Comments are high-leverage documentation. Wrong guidance compounds over time.

Debt Lived With: Three Success Stories

Story 1: Cross-Capsule Queries - The $20K Decision

The Debt: The list_by_tenant() method remains unimplemented!(). It panics if called. It’s been six months. It’s still there. Use Case: Operator tooling needs to list all items across capsules (pods) for a tenant. Happens fewer than 10 times per month. Current Workaround: Manually iterate through capsules (5 minutes/month). Implementation Cost:

Global query infrastructure: 60 hours
Cross-capsule coordination: 30 hours
Testing and edge cases: 10 hours
Total: 100 hours at $200/hr =$ 20,000

ROI Calculation:

Time saved: 5 minutes/month = 1 hour/year
Cost to maintain: 2 hours/year
Net savings: $19,800/year

Decision Framework (ADR-0015):

✓ Use cases are rare (<10/month)
✓ Workarounds exist and are acceptable
✓ Architectural integrity matters (avoid premature optimization)
✓ Implementation cost is high (100 hours)
→ Decision: Keep unimplemented

Why It Worked: Low frequency + working workaround + high implementation cost = don’t build it.

Story 2: UX Demo Applications

The Debt: TypeScript type mismatches in demo apps. Several component demos incomplete or outdated. Alternative: Storybook already provides interactive component demos with proper types. Fix Cost: 4 hours debugging + ongoing maintenance burden Decision: Deferred indefinitely. Storybook is sufficient for component exploration. ROI Calculation:

Value added: Minimal (Storybook covers use case)
Maintenance cost: 2-3 hours/month
Net savings: $800/year

Why It Worked: Don’t maintain two systems when one works. Recognize sunk costs.

Story 3: LocalStack Integration Tests

The Debt: Many integration tests marked #[ignore] - they won’t run locally without infrastructure setup. Blocker: Infrastructure setup scripts incomplete. Workaround: CI environment runs tests with proper LocalStack setup. Decision: Defer until blocker resolved. Tests run where they matter (CI). Why It Worked: Tests still run in gating environment. Local convenience isn’t worth unblocking effort yet.

Debt Lived With: One Failure Story

Capsule Isolation Violations

The Debt: Four entities stayed tenant-scoped when they should have been capsule-scoped. In a multi-capsule (pod) architecture, this means dev/test data could mix with production data. Impact: HIGH RISK - architectural integrity violation, potential data leakage. Discovery: Architectural audit found 7% violation rate (4 out of ~60 entities). Original Fix Cost (if done immediately): 4 hours per entity = 16 hours total Current Fix Cost (after 6 months): 12-18 hours per entity = 48-72 hours total Why It Got Harder:

More code depends on wrong scoping
Migration complexity increased
Test data assumptions baked in

The Lesson: Security and architectural debt compounds. The 7% violation became harder to fix over time, not easier. Some debt categories should never be deferred.

Three Decision Frameworks

Framework 1: Issue Backpressure (Priority Labels)

This framework categorizes work by urgency and impact:

CRITICAL  → Production bug, security issue
           Action: Do first, drop other work
           
HIGH      → Blocks team work, user-facing feature
           Action: Do next, schedule within sprint
           
MEDIUM    → Important, not blocking
           Action: Do when bandwidth available
           
LOW       → Nice to have, polish, optimization
           Action: Defer to backlog
           
BLOCKED   → Waiting on dependency
           Action: Skip until unblocked

Example Applications:

Capsule isolation violations → CRITICAL (architectural risk)
Pipeline customization → HIGH (blocks enterprise sales)
AWS runtime consolidation → MEDIUM (important, not urgent)
Cross-capsule queries → LOW (workaround exists)
LocalStack tests → BLOCKED (waiting on #441)

Framework 2: Cost/Benefit Matrix

Map frequency against value to determine action:

                HIGH FREQUENCY        LOW FREQUENCY
                (>100/day)            (<10/day)
              ┌───────────────────┬─────────────────┐
HIGH VALUE    │ Build Now         │ Workaround OK   │
(user/rev)    │                   │                 │
              │ Ex: Pipeline      │ Ex: Cross-      │
              │ customization     │ capsule queries │
              │                   │                 │
              ├───────────────────┼─────────────────┤
LOW VALUE     │ Build If Easy     │ Defer/Won't Fix │
(polish)      │                   │                 │
              │ Ex: Helper macros │ Ex: Demo apps   │
              │ (1 hour, 260 LOC) │ Ex: Soft-delete │
              │                   │                 │
              └───────────────────┴─────────────────┘

ALWAYS FIX IMMEDIATELY (OVERRIDE ALL QUADRANTS):
❗ Security issues
❗ Architectural violations
❗ Compliance gaps
❗ Blocking dependencies

Key Insight: Cross-capsule queries fell into “high value, low frequency” - exactly where workarounds make sense. Demo apps fell into “low value, low frequency” - classic defer/won’t-fix territory.

Framework 3: Stage-Based Complexity (Lean Until Proven)

Don’t build Stage 4 infrastructure in Stage 0:

Stage 0 (Garage):      Minimal complexity, prove concept
Stage 1 (Internal):    Low complexity, internal users
Stage 2 (Design):      Medium complexity, first customers
Stage 3 (Growth):      Full complexity, scaling challenges
Stage 4 (Enterprise):  Maximum complexity, enterprise needs

Application Examples:

Cross-capsule queries (Stage 4) → Defer in Stage 0 ✓
Configuration governance (Stage 2) → Transitional approach OK ✓
Soft-delete (Stage 3) → Defer to backlog until scaling needs ✓
Pipeline customization (Stage 2) → Build for design partners ✓

Rule: Match infrastructure complexity to current stage, not future dreams.

When Managed Services Win

Sometimes the best way to avoid technical debt is to not write the code at all.

Case Study: Timestream vs Custom DynamoDB

Requirement: Store and query time-series metrics data. Option A: Custom DynamoDB Implementation

Development: 100 hours ($20,000)
Operations: 10 hours/month ($2,000/year)
Storage: $280-450/month
Query performance: 2-8 seconds
Total Year 1: $48,584

Option B: AWS Timestream (Managed Service)

Development: 0 hours
Operations: 0 hours/month
Storage: $89/month
Query performance: less than 500ms
Total Year 1: $1,068

Savings: $47,540/year (97% cost reduction) The Decision Rule: When a managed service costs 3% of a custom solution and performs better, building custom is technical debt from day one.

Transitional Debt: How to Take On Debt Safely

Not all debt is created equal. Some debt is explicitly temporary - planned, tracked, and bounded.

Configuration Governance Example

The Requirement: Implement configuration governance with encryption, field-level validation, and migration capabilities. Option A: Full Implementation

Cost: 5 days
Risk: Over-engineering for current needs

Option B: Iterative (4 phases)

Phase 1: Basic interface (1 day)
Phase 2: Value types (1 day)
Phase 3: Migration tools (1 day)
Phase 4: Encryption (1 day)
Total: 4 days, delivered incrementally

Chosen Approach: Option B with explicit technical debt tracking.

Requirements for Transitional Debt

Every transitional debt decision must have:

MUST HAVE:
✓ "technical-debt" label on issue
✓ Follow-up issues created for future phases
✓ Migration path documented in ADR
✓ Acceptance criteria defined
✓ Revisit timeline set (e.g., "after 3 months" or "when X customers")

MUST NOT:
✗ Open-ended "we'll fix it later"
✗ No tracking or documentation
✗ No clear success criteria

Why This Works: Transitional debt is debt with a plan. It’s bounded, tracked, and has exit criteria.

The Documentation Checklist

Good debt management requires documentation. Every debt decision needs:

ADR or Plan Document
- Explains WHY the decision was made
- Captures alternatives considered
- Documents decision criteria
GitHub Issue
- Tracks WHAT needs to be done (if anything)
- Links to ADR for context
Labels
- technical-debt - Marks known debt
- blocked - Waiting on dependency
- deferred - Intentionally postponed
- wont-fix - Explicitly accepting debt
Migration Path or Acceptance Criteria
- How to fix it (if we decide to)
- What success looks like
Decision Point
- When to revisit (timeline, milestone, or condition)
- Example: “Revisit when >100 queries/month”

The Key Insight

Quote from our project constitution:

“Good debt management is not avoiding all debt - it’s making conscious decisions, documenting them clearly, and having criteria for when to revisit.”

The difference between strategic debt and technical bankruptcy is documentation and decision-making.

Outcomes Summary

Debt Paid Down:

AWS runtime migration: 16 hours invested, positive ROI in 8 months
LegalEntityId consolidation: 30 minutes, immediate ROI
Pipeline refactor: 12 hours, unlocked enterprise sales
TODO cleanup: 1 hour, prevented architectural confusion

Debt Lived With Successfully:

Cross-capsule queries: $19,800/year savings vs building it
Demo apps: $800/year savings by accepting Storybook
LocalStack tests: Deferred until blocker resolved

Debt Lived With Too Long:

Capsule isolation violations: 4x harder to fix after 6 months
Lesson: Architectural debt compounds

Transitional Debt:

Configuration governance: Tracked explicitly across 4 phases
Status: Manageable and progressing

Applying These Frameworks

When you encounter technical debt, run through these questions:

Priority (Backpressure): Is this CRITICAL, HIGH, MEDIUM, LOW, or BLOCKED?
Frequency: How often is this code executed or touched?
Value: Does this impact revenue, users, security, or architecture?
Cost: How many hours to fix? What’s the risk of waiting?
Stage: Does this complexity match our current stage?
Alternative: Is there a managed service or workaround?

Then map to the decision matrix:

CRITICAL or security: Fix immediately, no questions
High frequency + high value: Build now
Low frequency + high value: Workaround acceptable
High frequency + low value: Build only if cheap
Low frequency + low value: Defer or won’t-fix

And finally: Document the decision. Whether you fix it, defer it, or accept it forever - write down why. Your future self (and your teammates) will thank you.

The unimplemented function saved us $20,000. The architectural violation cost us 3x more to fix later. The difference wasn’t the debt itself - it was knowing which debt to take on, and which to pay down immediately. Choose wisely.

Workflows

Process

Technical Debt Triage: What to Fix, What to Live With

Technical Debt Triage: What to Fix, What to Live With

The Problem with Technical Debt Advice

Debt Paid Down: Four Success Stories

Story 1: AWS Runtime Migration

Story 2: LegalEntityId Consolidation

Story 3: Opportunity Pipeline Refactor

Story 4: TODO Comment Cleanup

Debt Lived With: Three Success Stories

Story 1: Cross-Capsule Queries - The $20K Decision

Story 2: UX Demo Applications

Story 3: LocalStack Integration Tests

Debt Lived With: One Failure Story

Capsule Isolation Violations

Three Decision Frameworks

Framework 1: Issue Backpressure (Priority Labels)

Framework 2: Cost/Benefit Matrix

Framework 3: Stage-Based Complexity (Lean Until Proven)

When Managed Services Win

Case Study: Timestream vs Custom DynamoDB

Transitional Debt: How to Take On Debt Safely

Configuration Governance Example

Requirements for Transitional Debt

The Documentation Checklist

The Key Insight

Outcomes Summary

Applying These Frameworks

Workflows

Process

​Technical Debt Triage: What to Fix, What to Live With

​The Problem with Technical Debt Advice

​Debt Paid Down: Four Success Stories

​Story 1: AWS Runtime Migration

​Story 2: LegalEntityId Consolidation

​Story 3: Opportunity Pipeline Refactor

​Story 4: TODO Comment Cleanup

​Debt Lived With: Three Success Stories

​Story 1: Cross-Capsule Queries - The $20K Decision

​Story 2: UX Demo Applications

​Story 3: LocalStack Integration Tests

​Debt Lived With: One Failure Story

​Capsule Isolation Violations

​Three Decision Frameworks

​Framework 1: Issue Backpressure (Priority Labels)

​Framework 2: Cost/Benefit Matrix

​Framework 3: Stage-Based Complexity (Lean Until Proven)

​When Managed Services Win

​Case Study: Timestream vs Custom DynamoDB

​Transitional Debt: How to Take On Debt Safely

​Configuration Governance Example

​Requirements for Transitional Debt

​The Documentation Checklist

​The Key Insight

​Outcomes Summary

​Applying These Frameworks

Technical Debt Triage: What to Fix, What to Live With

The Problem with Technical Debt Advice

Debt Paid Down: Four Success Stories

Story 1: AWS Runtime Migration

Story 2: LegalEntityId Consolidation

Story 3: Opportunity Pipeline Refactor

Story 4: TODO Comment Cleanup

Debt Lived With: Three Success Stories

Story 1: Cross-Capsule Queries - The $20K Decision

Story 2: UX Demo Applications

Story 3: LocalStack Integration Tests

Debt Lived With: One Failure Story

Capsule Isolation Violations

Three Decision Frameworks

Framework 1: Issue Backpressure (Priority Labels)

Framework 2: Cost/Benefit Matrix

Framework 3: Stage-Based Complexity (Lean Until Proven)

When Managed Services Win

Case Study: Timestream vs Custom DynamoDB

Transitional Debt: How to Take On Debt Safely

Configuration Governance Example

Requirements for Transitional Debt

The Documentation Checklist

The Key Insight

Outcomes Summary

Applying These Frameworks