Skip to main content
This is Week 8 of “Building with AI” - a 10-week journey documenting how I use multi-agent AI workflows to build a production-grade SaaS platform.This week: The meta-optimization—how I reduced AI context usage by 85% and discovered that working with AI requires its own set of collaboration patterns.Related: Week 6: AWS Runtime Adoption | Week 7: Configuration Governance

Watch the 60-Second Summary

Week 8: Reducing token usage through smart context management

The Wall

Week 7 ended with a problem I didn’t expect. Not a bug. Not a missing feature. Not a deployment failure. My AI coding sessions kept dying before finishing work. I’d start a session to implement a medium-complexity feature. Claude would load 50k tokens of rules and guidelines upfront. Then build the project after every tiny change—another 8k tokens each time. By mid-session, I’d hit 150k tokens, dangerously close to the 200k context limit. The session would slow down. Responses would take longer. Eventually I’d have to split simple tasks across multiple sessions just to limp across the finish line. I wasn’t just wasting tokens. I was wasting time, money, and most importantly, momentum.
The realization: I’d optimized how I use AI to write code. But I hadn’t optimized how I work with AI.Like learning Git without understanding branching strategies. Or adopting Docker without learning orchestration patterns. I had the tools, but not the patterns.
This week wasn’t about shipping features. It was about learning the meta-patterns of AI collaboration—and discovering that 85% of my context budget was being wasted on the wrong things.

What We Actually Built (Meta Edition)

Most weeks, this section lists features shipped. This week is different. I didn’t ship features. I shipped optimization patterns for AI collaboration:
  1. Batch Build Policy - Stop building after every change (80% build token reduction)
  2. Session Boundary Protocol - Fresh sessions eliminate bias AND bloat (85% context reduction)
  3. Smart Context Loading - Just-in-time rule loading instead of upfront dumps (60% loading reduction)
The results:
  • Token usage: 150k → 20k per session (85% reduction)
  • Build tokens: 80k-120k → 16k-24k (80% reduction)
  • Initial context: 50k → 15k (70% reduction)
  • Session reliability: Tasks that failed mid-session now complete successfully
  • Time to completion: 50-70% faster
The real win wasn’t the metrics. It was the insight: Working with AI requires its own collaboration patterns, just like any new platform.

The Diagnosis: Teaching Everything Upfront

The problem started innocently. I loaded all my coding rules at session start: testing guidelines, commit conventions, macro patterns, repository standards—50k tokens before writing a single line of code. The logic seemed sound: “Give Claude full context upfront.” But here’s what I missed: AI doesn’t work like human onboarding. When you onboard a new engineer, they need comprehensive training because they can’t instantly access documentation. But AI can load a 3k token guide in milliseconds—when it actually needs it. I was treating AI like it needs to “remember everything.” But AI doesn’t need memory—it needs to know where to look when needed.
Eager Loading (What I Did)
Session Start:

Load ALL rules (50k tokens)

Load project context (15k tokens)

Start implementation (5k tokens)

Build #1 (8k tokens)

Make change, Build #2 (8k tokens)

Make change, Build #3 (8k tokens)

... 10 more builds ...

Token limit approaching (150k)

Session fails or requires restart
The shift: From “load the entire textbook” to “load the catalog, fetch chapters on-demand.”

Phase 1: Stop Building Obsessively

I analyzed a typical session and found something shocking: 10-15 cargo builds × 8k tokens each = 80k-120k tokens spent just on builds. I was running cargo build after every tiny change. Fix a typo? Build. Add a function? Build. It felt productive, but I was burning through my context budget. Here’s the thing: I already have compile-time feedback. rust-analyzer shows type errors, missing imports, and syntax mistakes in real-time. I don’t need the actual build to know if code compiles.

The Solution: Batch Build Policy

Trust IDE diagnostics during development. Build only at quality gates:
  • ✅ Pre-commit hook, before PR creation, in CI/CD
  • ❌ After every file edit, “just to check” builds
The shift: Builds aren’t verification—they’re checkpoints.
Modern Rust tooling gives us real-time feedback through rust-analyzer (instant type errors), cargo check (fast validation), and cargo build (full compilation). I was using the slowest layer for feedback the fastest layers already provide. Builds should be checkpoints, not constant verification.
I condensed verbose rules (12-15k tokens) into “quick reference” versions (2-3k tokens), keeping core principles and removing extensive examples. Six rules got this treatment: 75% token reduction per file.Created a specialized profile loading only essentials upfront (~14k) with on-demand loading for specialized rules. Result: 50k → 14k upfront context (72% reduction).
Impact:
  • Builds per session: 10-15 → 2-3 (80% reduction)
  • Build tokens: 80k-120k → 16k-24k (80% reduction)
  • Freed capacity: ~100k tokens for actual work
But I wasn’t done. I’d fixed the build problem. Now I had to fix the context accumulation problem.

Phase 2: Fresh Context is a Feature, Not a Bug

The pattern I didn’t see coming: Planning session explores codebase and creates plan (40k tokens). Implementation builds the feature (95k total). Verification checks the work—but it’s carrying 95k tokens of implementation details, “remembering” shortcuts and compromises. The result: Biased verification that rationalizes instead of truly verifies.

The Counterintuitive Insight

I thought continuous sessions provided “momentum.” Turns out, fresh context is a feature, not a bug. When you close a session and start fresh, you force explicit handoffs. The new session reads requirements from scratch. It verifies what’s actually there, not what it thinks should be there. It catches the shortcuts the builder rationalized. This concept builds on the multi-agent workflows from Week 1, where I established the Planner-Builder-Verifier pattern. Now I’m adding session boundaries to enhance that separation.
Problem: Context Accumulation + Bias
Planner Session:
- Explore codebase (15k tokens)
- Evaluate options (10k tokens)
- Design solution (8k tokens)
- Create plan (7k tokens)
Total: 40k tokens

↓ (Keep session going)

Builder Session (same session):
- Read plan (inherits 40k planning context)
- Implement feature (25k tokens)
- Build + iterate (30k tokens)
Total: 95k tokens

↓ (Keep session going)

Verifier Session (same session):
- Inherits 95k implementation context
- "Verifies" based on remembered decisions
- Finds surface issues, misses architectural problems
Total: 110k+ tokens
Issue: Verifier is biased by implementation memory

The Protocol

I formalized clean handoffs with a phase transition checklist: Planner → Builder:
  • Plan saved to structured documentation
  • GitHub issue updated with summary
  • Close session (Builder starts fresh)
Builder → Verifier:
  • All changes committed
  • PR created
  • Close session (Verifier starts fresh)
Impact: 85% context reduction, zero verification bias.

Phase 3: The Library Pattern

With builds batched and sessions separated, I had one more inefficiency: upfront rule loading. Even with condensed rules and specialized profiles, I was still loading 14-15k tokens at session start. But here’s the thing: most sessions don’t need all the rules. If you’re implementing a simple feature without tests (tests come later), why load testing rules? If you’re not creating a commit yet, why load commit conventions? The insight: Treat knowledge like a library, not a textbook. You don’t walk into a library and check out every book at once. You browse the catalog (lightweight), then fetch specific books when you need them.

Smart Context Loading Strategy

I implemented trigger-based rule loading—load minimal context at start, fetch specific rules when triggered:
Load ~15k tokens:
  • Core patterns (entities, macros)
  • File organization
  • Autonomous mode rules
  • Index of available rules
Total: 15k tokens
Example session flow:
Start:               15k tokens (minimal context)
Implement feature:   +5k tokens (code)
Tests fail:          +0.9k tokens (load testing rules)
Fix tests:           +2k tokens (changes)
Ready to commit:     +1k tokens (load commit rules)
Final build:         +8k tokens (checkpoint)
─────────────────────────────────
Total:               31.9k tokens (vs 150k before)
The beauty of this approach: You only pay for what you use.
Testing Rules: Load when tests fail or task completion invoked
Commit Rules: Load when git commit detected
Macro Rules: Load when creating DynamoDB entities
Repository Rules: Load when creating new repository
Each rule is 1-3k tokens, loaded only when needed.
I created a 500-token index that loads at session start, telling Claude where to find rules without loading them:
# Rule Index
**Core Rules (loaded by default):** File organization, naming
**On-Demand Rules (load when triggered):**
- testing-guide.md → Load when: tests fail
- commit-guide.md → Load when: git commit
- entity-patterns.md → Load when: new entity
Impact:
  • Session start context: 50k → 15k (70% reduction)
  • Average session context: 50k → 20k (60% reduction)
  • Session reliability: No more context limit failures
  • Response speed: 50-70% faster (less context to process)

The Results: Complete Transformation

Typical Session (150k tokens)
Session Start: Load all rules         50,000 tokens
Implementation: Initial code           8,000 tokens
Build #1-8 (8x8k each)                64,000 tokens
Tests + fixes                         15,000 tokens
Final build                            8,000 tokens
───────────────────────────────────────────────
Total:                               145,000 tokens

Status: Approaching 200k limit
Risk: Session failure mid-task
Key Metrics:
MetricBeforeAfterChange
Session tokens150k20k85% ↓
Build tokens80-120k16-24k80% ↓
Initial context50k15k70% ↓
Failed sessionsCommonZero100% ↓
Time to completionBaseline50-70% fasterMajor ↑

What We Learned: The Meta-Patterns

1. Context is finite—manage it like memory

Just like embedded systems with 256KB RAM, you need to be intentional about what you load. Pattern: Lazy loading beats eager loading.

2. Batching isn’t just for databases

We batch database queries and API calls. Now we batch builds. Pattern: Batch expensive operations at checkpoints.

3. Fresh start beats biased momentum

Continuous sessions create bias. Clean boundaries eliminate both bias and bloat. Pattern: Use session boundaries as quality gates.

4. Trust, but verify at checkpoints

Constant verification has diminishing returns. Pattern: Use real-time feedback for iteration, verify at strategic points.

5. AI doesn’t need comprehensive onboarding

Humans need upfront training because they can’t instantly access documentation. AI can load knowledge in milliseconds. Pattern: Give AI a catalog, not a textbook.
What I tried: Randomly reducing context without structure.Why it failed: Without triggers, I’d forget critical rules. Tests failed with confusing errors. Commits violated conventions.The fix: Created explicit trigger patterns—load rules based on specific events (test failure, git commit, etc.).
What I tried: Keeping sessions across planning, implementation, and verification for “momentum.”Why it failed: The momentum created bias. Verifier rationalized shortcuts instead of catching them. Plus, context bloat made sessions fail mid-task.The fix: Enforced clean session boundaries with formal handoff protocol.

The Meta-Story: Optimizing the Optimizer

Week 8 isn’t about what we built—it’s about how we learned to work with AI sustainably. The parallel: Remember learning Git? First you committed to master. Then you discovered branching. Then merge strategies. Then rebase vs merge. Each layer taught you how to work with Git effectively. Same with Docker: First containers. Then docker-compose. Then orchestration. Then Kubernetes. Each step revealed new patterns. AI is no different. First prompts. Then multi-agent workflows (see Week 1: Multi-Agent Setup). Now context management, session boundaries, and just-in-time loading. These patterns are counterintuitive:
  • ✅ Fresh start beats continuous momentum
  • ✅ Batching beats real-time feedback
  • ✅ Minimal context beats comprehensive context
Why this matters: Every team using AI will hit this wall. I’m documenting these patterns now—while they’re still novel—so others can adopt them before the pain hits.

Actionable Takeaways

If you’re building with AI, here’s what to implement: 1. Batch Build Policy: Trust IDE diagnostics during development. Build only at checkpoints (pre-commit, PR creation, CI). Savings: 80% fewer builds. 2. Enforce Session Boundaries: Close sessions between phases. Use GitHub comments for handoffs. Fresh sessions eliminate bias and bloat. Benefit: Unbiased verification + 85% context reduction. 3. Load Just-in-Time: Create a rule index. Load minimal context at start, then load additional rules when triggered. Savings: 60% less context. 4. Condense Guidelines: Turn verbose documentation into quick-reference files. Keep principles, remove extensive examples. Target: 75% reduction. 5. Track Token Usage: Monitor your context budget. What gets measured gets optimized.

What’s Next: Week 9 Preview

I’ve optimized how I work with AI. Next week, I’m scaling up. The challenge: I’m hitting DynamoDB’s single-table design limits. 15 entity types in one table. Complex access patterns. Query performance degrading. Do I add more GSIs? Migrate to multi-table? Adopt DynamoDB Streams + materialized views? Next week I’m tackling the data modeling inflection point—when your clever single-table design becomes a performance bottleneck, and you have to choose between complexity and speed. Stay tuned.

Subscribe to Building with AI

Get weekly posts about building production software with AI—honest experiments, real metrics, and the patterns we’re learning along the way.

All content represents personal learning from personal projects. No proprietary information is shared.