Week 8: How I Cut AI Token Usage by 85%

This is Week 8 of “Building with AI” - a 10-week journey documenting how I use multi-agent AI workflows to build a production-grade SaaS platform.This week: The meta-optimization—how I reduced AI context usage by 85% and discovered that working with AI requires its own set of collaboration patterns.Related: Week 6: AWS Runtime Adoption | Week 7: Configuration Governance

Watch the 60-Second Summary

Week 8: Reducing token usage through smart context management

The Wall

Week 7 ended with a problem I didn’t expect. Not a bug. Not a missing feature. Not a deployment failure. My AI coding sessions kept dying before finishing work. I’d start a session to implement a medium-complexity feature. Claude would load 50k tokens of rules and guidelines upfront. Then build the project after every tiny change—another 8k tokens each time. By mid-session, I’d hit 150k tokens, dangerously close to the 200k context limit. The session would slow down. Responses would take longer. Eventually I’d have to split simple tasks across multiple sessions just to limp across the finish line. I wasn’t just wasting tokens. I was wasting time, money, and most importantly, momentum.

The realization: I’d optimized how I use AI to write code. But I hadn’t optimized how I work with AI.Like learning Git without understanding branching strategies. Or adopting Docker without learning orchestration patterns. I had the tools, but not the patterns.

This week wasn’t about shipping features. It was about learning the meta-patterns of AI collaboration—and discovering that 85% of my context budget was being wasted on the wrong things.

What We Actually Built (Meta Edition)

Most weeks, this section lists features shipped. This week is different. I didn’t ship features. I shipped optimization patterns for AI collaboration:

Batch Build Policy - Stop building after every change (80% build token reduction)
Session Boundary Protocol - Fresh sessions eliminate bias AND bloat (85% context reduction)
Smart Context Loading - Just-in-time rule loading instead of upfront dumps (60% loading reduction)

The results:

Token usage: 150k → 20k per session (85% reduction)
Build tokens: 80k-120k → 16k-24k (80% reduction)
Initial context: 50k → 15k (70% reduction)
Session reliability: Tasks that failed mid-session now complete successfully
Time to completion: 50-70% faster

The real win wasn’t the metrics. It was the insight: Working with AI requires its own collaboration patterns, just like any new platform.

The Diagnosis: Teaching Everything Upfront

The problem started innocently. I loaded all my coding rules at session start: testing guidelines, commit conventions, macro patterns, repository standards—50k tokens before writing a single line of code. The logic seemed sound: “Give Claude full context upfront.” But here’s what I missed: AI doesn’t work like human onboarding. When you onboard a new engineer, they need comprehensive training because they can’t instantly access documentation. But AI can load a 3k token guide in milliseconds—when it actually needs it. I was treating AI like it needs to “remember everything.” But AI doesn’t need memory—it needs to know where to look when needed.

The Problem Pattern
The Solution Pattern

Eager Loading (What I Did)

Session Start:
↓
Load ALL rules (50k tokens)
↓
Load project context (15k tokens)
↓
Start implementation (5k tokens)
↓
Build #1 (8k tokens)
↓
Make change, Build #2 (8k tokens)
↓
Make change, Build #3 (8k tokens)
↓
... 10 more builds ...
↓
Token limit approaching (150k)
↓
Session fails or requires restart

Just-in-Time Loading (What I Should Do)

Session Start:
↓
Load MINIMAL context (15k tokens)
↓
Implement using IDE diagnostics (5k tokens)
↓
Tests fail? Load testing guide (1.5k tokens)
↓
Continue implementation (3k tokens)
↓
Ready to commit? Load commit guide (1k tokens)
↓
Build at checkpoint (8k tokens)
↓
Complete successfully (33.5k total)

The shift: From “load the entire textbook” to “load the catalog, fetch chapters on-demand.”

Phase 1: Stop Building Obsessively

I analyzed a typical session and found something shocking: 10-15 cargo builds × 8k tokens each = 80k-120k tokens spent just on builds. I was running cargo build after every tiny change. Fix a typo? Build. Add a function? Build. It felt productive, but I was burning through my context budget. Here’s the thing: I already have compile-time feedback. rust-analyzer shows type errors, missing imports, and syntax mistakes in real-time. I don’t need the actual build to know if code compiles.

The Solution: Batch Build Policy

Trust IDE diagnostics during development. Build only at quality gates:

✅ Pre-commit hook, before PR creation, in CI/CD
❌ After every file edit, “just to check” builds

The shift: Builds aren’t verification—they’re checkpoints.

Why This Works

Modern Rust tooling gives us real-time feedback through rust-analyzer (instant type errors), cargo check (fast validation), and cargo build (full compilation). I was using the slowest layer for feedback the fastest layers already provide. Builds should be checkpoints, not constant verification.

Implementation Details

I condensed verbose rules (12-15k tokens) into “quick reference” versions (2-3k tokens), keeping core principles and removing extensive examples. Six rules got this treatment: 75% token reduction per file.Created a specialized profile loading only essentials upfront (~14k) with on-demand loading for specialized rules. Result: 50k → 14k upfront context (72% reduction).

Impact:

Builds per session: 10-15 → 2-3 (80% reduction)
Build tokens: 80k-120k → 16k-24k (80% reduction)
Freed capacity: ~100k tokens for actual work

But I wasn’t done. I’d fixed the build problem. Now I had to fix the context accumulation problem.

Phase 2: Fresh Context is a Feature, Not a Bug

The pattern I didn’t see coming: Planning session explores codebase and creates plan (40k tokens). Implementation builds the feature (95k total). Verification checks the work—but it’s carrying 95k tokens of implementation details, “remembering” shortcuts and compromises. The result: Biased verification that rationalizes instead of truly verifies.

The Counterintuitive Insight

I thought continuous sessions provided “momentum.” Turns out, fresh context is a feature, not a bug. When you close a session and start fresh, you force explicit handoffs. The new session reads requirements from scratch. It verifies what’s actually there, not what it thinks should be there. It catches the shortcuts the builder rationalized. This concept builds on the multi-agent workflows from Week 1, where I established the Planner-Builder-Verifier pattern. Now I’m adding session boundaries to enhance that separation.

Before: Continuous Sessions
After: Clean Boundaries

Problem: Context Accumulation + Bias

Planner Session:
- Explore codebase (15k tokens)
- Evaluate options (10k tokens)
- Design solution (8k tokens)
- Create plan (7k tokens)
Total: 40k tokens

↓ (Keep session going)

Builder Session (same session):
- Read plan (inherits 40k planning context)
- Implement feature (25k tokens)
- Build + iterate (30k tokens)
Total: 95k tokens

↓ (Keep session going)

Verifier Session (same session):
- Inherits 95k implementation context
- "Verifies" based on remembered decisions
- Finds surface issues, misses architectural problems
Total: 110k+ tokens

Issue: Verifier is biased by implementation memory

Solution: Fresh Sessions with Formal Handoffs

Planner Session:
- Explore + design + plan
- Document plan in structured file
- Create GitHub issue comment with summary
- CLOSE SESSION ✂️
Total: 40k tokens (discarded)

↓ (Fresh start)

Builder Session (new session):
- Read plan from file (3k tokens)
- Implement feature (12k tokens)
- Build at checkpoints (8k tokens)
- Document in commit message
- CLOSE SESSION ✂️
Total: 23k tokens (discarded)

↓ (Fresh start)

Verifier Session (new session):
- Read requirements from GitHub issue (2k tokens)
- Read implementation from diff (5k tokens)
- Verify behavior matches requirements
- No bias from implementation decisions
Total: 12k tokens

Benefit: Unbiased verification + 85% context reduction

The Protocol

I formalized clean handoffs with a phase transition checklist: Planner → Builder:

Plan saved to structured documentation
GitHub issue updated with summary
Close session (Builder starts fresh)

Builder → Verifier:

All changes committed
PR created
Close session (Verifier starts fresh)

Impact: 85% context reduction, zero verification bias.

Phase 3: The Library Pattern

With builds batched and sessions separated, I had one more inefficiency: upfront rule loading. Even with condensed rules and specialized profiles, I was still loading 14-15k tokens at session start. But here’s the thing: most sessions don’t need all the rules. If you’re implementing a simple feature without tests (tests come later), why load testing rules? If you’re not creating a commit yet, why load commit conventions? The insight: Treat knowledge like a library, not a textbook. You don’t walk into a library and check out every book at once. You browse the catalog (lightweight), then fetch specific books when you need them.

Smart Context Loading Strategy

I implemented trigger-based rule loading—load minimal context at start, fetch specific rules when triggered:

Session Start (Minimal)
Tests Fail Trigger
Commit Trigger
Specialized Pattern Trigger

Load ~15k tokens:

Core patterns (entities, macros)
File organization
Autonomous mode rules
Index of available rules

Total: 15k tokens

Example session flow:

Start:               15k tokens (minimal context)
Implement feature:   +5k tokens (code)
Tests fail:          +0.9k tokens (load testing rules)
Fix tests:           +2k tokens (changes)
Ready to commit:     +1k tokens (load commit rules)
Final build:         +8k tokens (checkpoint)
─────────────────────────────────
Total:               31.9k tokens (vs 150k before)

The beauty of this approach: You only pay for what you use.

How Triggers Work

Testing Rules: Load when tests fail or task completion invoked
Commit Rules: Load when git commit detected
Macro Rules: Load when creating DynamoDB entities
Repository Rules: Load when creating new repositoryEach rule is 1-3k tokens, loaded only when needed.

The Rule Catalog

I created a 500-token index that loads at session start, telling Claude where to find rules without loading them:

# Rule Index
**Core Rules (loaded by default):** File organization, naming
**On-Demand Rules (load when triggered):**
- testing-guide.md → Load when: tests fail
- commit-guide.md → Load when: git commit
- entity-patterns.md → Load when: new entity

Impact:

Session start context: 50k → 15k (70% reduction)
Average session context: 50k → 20k (60% reduction)
Session reliability: No more context limit failures
Response speed: 50-70% faster (less context to process)

The Results: Complete Transformation

Before Optimization
After Optimization

Typical Session (150k tokens)

Session Start: Load all rules         50,000 tokens
Implementation: Initial code           8,000 tokens
Build #1-8 (8x8k each)                64,000 tokens
Tests + fixes                         15,000 tokens
Final build                            8,000 tokens
───────────────────────────────────────────────
Total:                               145,000 tokens

Status: Approaching 200k limit
Risk: Session failure mid-task

Typical Session (40k tokens)

Session Start: Minimal context        15,000 tokens
Implementation: IDE feedback           8,000 tokens
Iterate with rust-analyzer             4,000 tokens
Load testing rules (on-demand)           875 tokens
Fix tests                              3,000 tokens
Load commit rules (on-demand)          1,000 tokens
Pre-commit build (checkpoint)          8,000 tokens
───────────────────────────────────────────────
Total:                                39,875 tokens

Status: 20% of limit—plenty of headroom
Improvement: 73% reduction

Key Metrics:

Metric	Before	After	Change
Session tokens	150k	20k	85% ↓
Build tokens	80-120k	16-24k	80% ↓
Initial context	50k	15k	70% ↓
Failed sessions	Common	Zero	100% ↓
Time to completion	Baseline	50-70% faster	Major ↑

What We Learned: The Meta-Patterns

1. Context is finite—manage it like memory

Just like embedded systems with 256KB RAM, you need to be intentional about what you load. Pattern: Lazy loading beats eager loading.

2. Batching isn’t just for databases

We batch database queries and API calls. Now we batch builds. Pattern: Batch expensive operations at checkpoints.

3. Fresh start beats biased momentum

Continuous sessions create bias. Clean boundaries eliminate both bias and bloat. Pattern: Use session boundaries as quality gates.

4. Trust, but verify at checkpoints

Constant verification has diminishing returns. Pattern: Use real-time feedback for iteration, verify at strategic points.

5. AI doesn’t need comprehensive onboarding

Humans need upfront training because they can’t instantly access documentation. AI can load knowledge in milliseconds. Pattern: Give AI a catalog, not a textbook.

Failed Experiment: 'Just Load Less'

What I tried: Randomly reducing context without structure.Why it failed: Without triggers, I’d forget critical rules. Tests failed with confusing errors. Commits violated conventions.The fix: Created explicit trigger patterns—load rules based on specific events (test failure, git commit, etc.).

Failed Experiment: Continuous Sessions

What I tried: Keeping sessions across planning, implementation, and verification for “momentum.”Why it failed: The momentum created bias. Verifier rationalized shortcuts instead of catching them. Plus, context bloat made sessions fail mid-task.The fix: Enforced clean session boundaries with formal handoff protocol.

The Meta-Story: Optimizing the Optimizer

Week 8 isn’t about what we built—it’s about how we learned to work with AI sustainably. The parallel: Remember learning Git? First you committed to master. Then you discovered branching. Then merge strategies. Then rebase vs merge. Each layer taught you how to work with Git effectively. Same with Docker: First containers. Then docker-compose. Then orchestration. Then Kubernetes. Each step revealed new patterns. AI is no different. First prompts. Then multi-agent workflows (see Week 1: Multi-Agent Setup). Now context management, session boundaries, and just-in-time loading. These patterns are counterintuitive:

✅ Fresh start beats continuous momentum
✅ Batching beats real-time feedback
✅ Minimal context beats comprehensive context

Why this matters: Every team using AI will hit this wall. I’m documenting these patterns now—while they’re still novel—so others can adopt them before the pain hits.

Actionable Takeaways

If you’re building with AI, here’s what to implement: 1. Batch Build Policy: Trust IDE diagnostics during development. Build only at checkpoints (pre-commit, PR creation, CI). Savings: 80% fewer builds. 2. Enforce Session Boundaries: Close sessions between phases. Use GitHub comments for handoffs. Fresh sessions eliminate bias and bloat. Benefit: Unbiased verification + 85% context reduction. 3. Load Just-in-Time: Create a rule index. Load minimal context at start, then load additional rules when triggered. Savings: 60% less context. 4. Condense Guidelines: Turn verbose documentation into quick-reference files. Keep principles, remove extensive examples. Target: 75% reduction. 5. Track Token Usage: Monitor your context budget. What gets measured gets optimized.

What’s Next: Week 9 Preview

I’ve optimized how I work with AI. Next week, I’m scaling up. The challenge: I’m hitting DynamoDB’s single-table design limits. 15 entity types in one table. Complex access patterns. Query performance degrading. Do I add more GSIs? Migrate to multi-table? Adopt DynamoDB Streams + materialized views? Next week I’m tackling the data modeling inflection point—when your clever single-table design becomes a performance bottleneck, and you have to choose between complexity and speed. Stay tuned.

Subscribe to Building with AI

Get weekly posts about building production software with AI—honest experiments, real metrics, and the patterns we’re learning along the way.

All content represents personal learning from personal projects. No proprietary information is shared.

Building with AI

Autonomous Dev Org

Watch the 60-Second Summary

​The Wall

​What We Actually Built (Meta Edition)

​The Diagnosis: Teaching Everything Upfront

​Phase 1: Stop Building Obsessively

​The Solution: Batch Build Policy

​Phase 2: Fresh Context is a Feature, Not a Bug

​The Counterintuitive Insight

​The Protocol

​Phase 3: The Library Pattern

​Smart Context Loading Strategy

​The Results: Complete Transformation

​What We Learned: The Meta-Patterns

​1. Context is finite—manage it like memory

​2. Batching isn’t just for databases

​3. Fresh start beats biased momentum

​4. Trust, but verify at checkpoints

​5. AI doesn’t need comprehensive onboarding

​The Meta-Story: Optimizing the Optimizer

​Actionable Takeaways

​What’s Next: Week 9 Preview

Subscribe to Building with AI

The Wall

What We Actually Built (Meta Edition)

The Diagnosis: Teaching Everything Upfront

Phase 1: Stop Building Obsessively

The Solution: Batch Build Policy

Phase 2: Fresh Context is a Feature, Not a Bug

The Counterintuitive Insight

The Protocol

Phase 3: The Library Pattern

Smart Context Loading Strategy

The Results: Complete Transformation

What We Learned: The Meta-Patterns

1. Context is finite—manage it like memory

2. Batching isn’t just for databases

3. Fresh start beats biased momentum

4. Trust, but verify at checkpoints

5. AI doesn’t need comprehensive onboarding

The Meta-Story: Optimizing the Optimizer

Actionable Takeaways

What’s Next: Week 9 Preview