This is Week 8 of “Building with AI” - a 10-week journey documenting how I use multi-agent AI workflows to build a production-grade SaaS platform.This week: The meta-optimization—how I reduced AI context usage by 85% and discovered that working with AI requires its own set of collaboration patterns.Related: Week 6: AWS Runtime Adoption | Week 7: Configuration Governance
Watch the 60-Second Summary
Week 8: Reducing token usage through smart context management
The Wall
Week 7 ended with a problem I didn’t expect. Not a bug. Not a missing feature. Not a deployment failure. My AI coding sessions kept dying before finishing work. I’d start a session to implement a medium-complexity feature. Claude would load 50k tokens of rules and guidelines upfront. Then build the project after every tiny change—another 8k tokens each time. By mid-session, I’d hit 150k tokens, dangerously close to the 200k context limit. The session would slow down. Responses would take longer. Eventually I’d have to split simple tasks across multiple sessions just to limp across the finish line. I wasn’t just wasting tokens. I was wasting time, money, and most importantly, momentum. This week wasn’t about shipping features. It was about learning the meta-patterns of AI collaboration—and discovering that 85% of my context budget was being wasted on the wrong things.What We Actually Built (Meta Edition)
Most weeks, this section lists features shipped. This week is different. I didn’t ship features. I shipped optimization patterns for AI collaboration:- Batch Build Policy - Stop building after every change (80% build token reduction)
- Session Boundary Protocol - Fresh sessions eliminate bias AND bloat (85% context reduction)
- Smart Context Loading - Just-in-time rule loading instead of upfront dumps (60% loading reduction)
- Token usage: 150k → 20k per session (85% reduction)
- Build tokens: 80k-120k → 16k-24k (80% reduction)
- Initial context: 50k → 15k (70% reduction)
- Session reliability: Tasks that failed mid-session now complete successfully
- Time to completion: 50-70% faster
The Diagnosis: Teaching Everything Upfront
The problem started innocently. I loaded all my coding rules at session start: testing guidelines, commit conventions, macro patterns, repository standards—50k tokens before writing a single line of code. The logic seemed sound: “Give Claude full context upfront.” But here’s what I missed: AI doesn’t work like human onboarding. When you onboard a new engineer, they need comprehensive training because they can’t instantly access documentation. But AI can load a 3k token guide in milliseconds—when it actually needs it. I was treating AI like it needs to “remember everything.” But AI doesn’t need memory—it needs to know where to look when needed.- The Problem Pattern
- The Solution Pattern
Eager Loading (What I Did)
Phase 1: Stop Building Obsessively
I analyzed a typical session and found something shocking: 10-15 cargo builds × 8k tokens each = 80k-120k tokens spent just on builds. I was runningcargo build after every tiny change. Fix a typo? Build. Add a function? Build. It felt productive, but I was burning through my context budget.
Here’s the thing: I already have compile-time feedback. rust-analyzer shows type errors, missing imports, and syntax mistakes in real-time. I don’t need the actual build to know if code compiles.
The Solution: Batch Build Policy
Trust IDE diagnostics during development. Build only at quality gates:- ✅ Pre-commit hook, before PR creation, in CI/CD
- ❌ After every file edit, “just to check” builds
Why This Works
Why This Works
Modern Rust tooling gives us real-time feedback through rust-analyzer (instant type errors), cargo check (fast validation), and cargo build (full compilation). I was using the slowest layer for feedback the fastest layers already provide. Builds should be checkpoints, not constant verification.
Implementation Details
Implementation Details
I condensed verbose rules (12-15k tokens) into “quick reference” versions (2-3k tokens), keeping core principles and removing extensive examples. Six rules got this treatment: 75% token reduction per file.Created a specialized profile loading only essentials upfront (~14k) with on-demand loading for specialized rules. Result: 50k → 14k upfront context (72% reduction).
- Builds per session: 10-15 → 2-3 (80% reduction)
- Build tokens: 80k-120k → 16k-24k (80% reduction)
- Freed capacity: ~100k tokens for actual work
Phase 2: Fresh Context is a Feature, Not a Bug
The pattern I didn’t see coming: Planning session explores codebase and creates plan (40k tokens). Implementation builds the feature (95k total). Verification checks the work—but it’s carrying 95k tokens of implementation details, “remembering” shortcuts and compromises. The result: Biased verification that rationalizes instead of truly verifies.The Counterintuitive Insight
I thought continuous sessions provided “momentum.” Turns out, fresh context is a feature, not a bug. When you close a session and start fresh, you force explicit handoffs. The new session reads requirements from scratch. It verifies what’s actually there, not what it thinks should be there. It catches the shortcuts the builder rationalized. This concept builds on the multi-agent workflows from Week 1, where I established the Planner-Builder-Verifier pattern. Now I’m adding session boundaries to enhance that separation.- Before: Continuous Sessions
- After: Clean Boundaries
Problem: Context Accumulation + BiasIssue: Verifier is biased by implementation memory
The Protocol
I formalized clean handoffs with a phase transition checklist: Planner → Builder:- Plan saved to structured documentation
- GitHub issue updated with summary
- Close session (Builder starts fresh)
- All changes committed
- PR created
- Close session (Verifier starts fresh)
Phase 3: The Library Pattern
With builds batched and sessions separated, I had one more inefficiency: upfront rule loading. Even with condensed rules and specialized profiles, I was still loading 14-15k tokens at session start. But here’s the thing: most sessions don’t need all the rules. If you’re implementing a simple feature without tests (tests come later), why load testing rules? If you’re not creating a commit yet, why load commit conventions? The insight: Treat knowledge like a library, not a textbook. You don’t walk into a library and check out every book at once. You browse the catalog (lightweight), then fetch specific books when you need them.Smart Context Loading Strategy
I implemented trigger-based rule loading—load minimal context at start, fetch specific rules when triggered:- Session Start (Minimal)
- Tests Fail Trigger
- Commit Trigger
- Specialized Pattern Trigger
Load ~15k tokens:
- Core patterns (entities, macros)
- File organization
- Autonomous mode rules
- Index of available rules
How Triggers Work
How Triggers Work
Testing Rules: Load when tests fail or task completion invoked
Commit Rules: Load when git commit detected
Macro Rules: Load when creating DynamoDB entities
Repository Rules: Load when creating new repositoryEach rule is 1-3k tokens, loaded only when needed.
Commit Rules: Load when git commit detected
Macro Rules: Load when creating DynamoDB entities
Repository Rules: Load when creating new repositoryEach rule is 1-3k tokens, loaded only when needed.
The Rule Catalog
The Rule Catalog
I created a 500-token index that loads at session start, telling Claude where to find rules without loading them:
- Session start context: 50k → 15k (70% reduction)
- Average session context: 50k → 20k (60% reduction)
- Session reliability: No more context limit failures
- Response speed: 50-70% faster (less context to process)
The Results: Complete Transformation
- Before Optimization
- After Optimization
Typical Session (150k tokens)
| Metric | Before | After | Change |
|---|---|---|---|
| Session tokens | 150k | 20k | 85% ↓ |
| Build tokens | 80-120k | 16-24k | 80% ↓ |
| Initial context | 50k | 15k | 70% ↓ |
| Failed sessions | Common | Zero | 100% ↓ |
| Time to completion | Baseline | 50-70% faster | Major ↑ |
What We Learned: The Meta-Patterns
1. Context is finite—manage it like memory
Just like embedded systems with 256KB RAM, you need to be intentional about what you load. Pattern: Lazy loading beats eager loading.2. Batching isn’t just for databases
We batch database queries and API calls. Now we batch builds. Pattern: Batch expensive operations at checkpoints.3. Fresh start beats biased momentum
Continuous sessions create bias. Clean boundaries eliminate both bias and bloat. Pattern: Use session boundaries as quality gates.4. Trust, but verify at checkpoints
Constant verification has diminishing returns. Pattern: Use real-time feedback for iteration, verify at strategic points.5. AI doesn’t need comprehensive onboarding
Humans need upfront training because they can’t instantly access documentation. AI can load knowledge in milliseconds. Pattern: Give AI a catalog, not a textbook.Failed Experiment: 'Just Load Less'
Failed Experiment: 'Just Load Less'
What I tried: Randomly reducing context without structure.Why it failed: Without triggers, I’d forget critical rules. Tests failed with confusing errors. Commits violated conventions.The fix: Created explicit trigger patterns—load rules based on specific events (test failure, git commit, etc.).
Failed Experiment: Continuous Sessions
Failed Experiment: Continuous Sessions
What I tried: Keeping sessions across planning, implementation, and verification for “momentum.”Why it failed: The momentum created bias. Verifier rationalized shortcuts instead of catching them. Plus, context bloat made sessions fail mid-task.The fix: Enforced clean session boundaries with formal handoff protocol.
The Meta-Story: Optimizing the Optimizer
Week 8 isn’t about what we built—it’s about how we learned to work with AI sustainably. The parallel: Remember learning Git? First you committed to master. Then you discovered branching. Then merge strategies. Then rebase vs merge. Each layer taught you how to work with Git effectively. Same with Docker: First containers. Then docker-compose. Then orchestration. Then Kubernetes. Each step revealed new patterns. AI is no different. First prompts. Then multi-agent workflows (see Week 1: Multi-Agent Setup). Now context management, session boundaries, and just-in-time loading. These patterns are counterintuitive:- ✅ Fresh start beats continuous momentum
- ✅ Batching beats real-time feedback
- ✅ Minimal context beats comprehensive context
Actionable Takeaways
If you’re building with AI, here’s what to implement: 1. Batch Build Policy: Trust IDE diagnostics during development. Build only at checkpoints (pre-commit, PR creation, CI). Savings: 80% fewer builds. 2. Enforce Session Boundaries: Close sessions between phases. Use GitHub comments for handoffs. Fresh sessions eliminate bias and bloat. Benefit: Unbiased verification + 85% context reduction. 3. Load Just-in-Time: Create a rule index. Load minimal context at start, then load additional rules when triggered. Savings: 60% less context. 4. Condense Guidelines: Turn verbose documentation into quick-reference files. Keep principles, remove extensive examples. Target: 75% reduction. 5. Track Token Usage: Monitor your context budget. What gets measured gets optimized.What’s Next: Week 9 Preview
I’ve optimized how I work with AI. Next week, I’m scaling up. The challenge: I’m hitting DynamoDB’s single-table design limits. 15 entity types in one table. Complex access patterns. Query performance degrading. Do I add more GSIs? Migrate to multi-table? Adopt DynamoDB Streams + materialized views? Next week I’m tackling the data modeling inflection point—when your clever single-table design becomes a performance bottleneck, and you have to choose between complexity and speed. Stay tuned.Subscribe to Building with AI
Get weekly posts about building production software with AI—honest experiments, real metrics, and the patterns we’re learning along the way.
All content represents personal learning from personal projects. No proprietary information is shared.