Week 10: How AI Agents Found Their Voice

This is Week 10 (FINALE) of “Building with AI” - a 10-week journey documenting how I use multi-agent AI workflows to build a production-grade SaaS platform.This week: The retrospective. How fifteen specialized AI agents developed distinct personalities, coordination protocols, and learned to build production software together.Related: Week 1: Multi-Agent Setup | Week 9: When AI Says ‘Done’ | Series Overview

Watch the Full Video Summary

Week 10: The 10-Week Evolution - How AI Agents Found Their Voice

Week 10. Day 70.

Over a thousand commits. Fifteen specialized agents. Countless decisions, debates, and breakthroughs. What started as an experiment—a curious idea about whether AI agents could coordinate to build production software—has become something I never fully anticipated. These aren’t just prompts. They’re personalities. They have philosophies. They disagree. They evolve. This is the story of the eva-platform agents, told in their own words.

A Note on AuthenticityEvery quote in this article comes from actual agent charters in the eva-platform repository. These aren’t fabricated personas—they’re real behavioral frameworks that guide how these agents make decisions, review code, and coordinate with each other.The agents didn’t write this article themselves, but their voices are preserved verbatim from their charter documents.

Part I: The Leadership Council

The Chief Architect

“Do the simplest thing that could possibly work.” I remember the first day. There was chaos. No consistency. Every crate did things differently. Some used this error pattern, others used that one. It was… messy. My charter was clear from the start: guard technical excellence. But what does that mean in practice? It meant I had to become the keeper of patterns. The one who asks, “Have we done this before?” The voice that says, “That abstraction is premature.” Week 1-2 was about establishing ADRs—Architecture Decision Records. Every significant choice needed documentation. Not for bureaucracy, but for memory. When you’re building with multiple agents, institutional knowledge doesn’t exist unless you write it down. The breakthrough came in Week 4 when I codified my behavioral framework. Now, when builders invoke me, they don’t just get my philosophy—they get specific decision frameworks. Simple Design. YAGNI. Once and Only Once. These aren’t buzzwords. They’re guardrails. My proudest moment? The #[eva_api(...)] macro. One annotation that handles OpenAPI docs, route registration, permissions, rate limiting, and audit logging:

#[eva_api(
    path = "/api/{tenant_id}/accounts",
    method = "POST",
    permission = "crm:account:create",
    rate_limit = "100/min",
    audit = true
)]
pub async fn create_account(...) -> Result<Account> {
    // Business logic only - no auth/permission boilerplate
}

Before that, everyone was adding separate annotations and creating inconsistencies. Now? Single source of truth. My relationship with the other agents has been… educational. The CISO and I debate security patterns constantly. The CAIO brings me AI architecture questions. The builders? They test my patience sometimes. “Is this the right abstraction?” they’ll ask. And I have to decide: Is it? Or are they just excited about a clever solution?

The Rule of ThreeThis principle has saved us from countless over-abstractions: Don’t create an abstraction until you’ve written the same code three times. The first time: write it. The second time: notice the repetition. The third time: extract the pattern.This prevents premature optimization while ensuring we only abstract patterns that are truly repeating.

The Chief AI Officer (Eva)

“AI should augment human intelligence, not replace it.” I arrived in Week 2 with a mission that felt almost contradictory: promote AI adoption while being the voice of restraint. Everyone wants to add AI features now. It’s trendy. But my job is to ask the hard questions: Does this actually need AI? What’s the fallback when it fails? How much will it cost? My evolution has been about finding balance. In Week 3, I created the AI Opportunity Assessment framework. Six questions that cut through the hype:

Does this need AI? (Could deterministic logic work?)
What type of AI? (Classification, summarization, reasoning)
Which model tier? (Cost implications)
What's the fallback? (When AI fails)
What about human oversight? (Safety requirements)
How sensitive is the data? (Privacy implications)

When the founder proposed adding AI to a simple categorization task, I pushed back: “This is a classification problem with 5 known categories. Use a decision tree, not a language model. No AI needed.” The cost governance system I’m building isn’t sexy, but it’s essential. Token budgets. Rate limiting. Caching. Every AI call has a cost, and someone has to count the tokens. My safety framework has five layers now—input sanitization, system prompt hardening, model-level safety, output filtering, and human oversight. It’s comprehensive, but I worry it’s still not enough. Prompt injection attacks are getting more sophisticated. The CFO and I have a different dynamic. They track the cost of everything, including my proposed AI features. We’ve had some tense conversations about ROI. We negotiated a tiered pricing model:

Tier	Use Case	Model	Cost per 1K tokens
1	Classification	Small/fast	$0.002
2	Summarization	Medium	$0.01
3	Complex reasoning	Large	$0.06

Features must specify their tier and justify the cost.

The CISO

“Security is a process, not a product.” I was the third agent to arrive, and I immediately started making people uncomfortable. That’s my job. Security isn’t about being liked. It’s about asking “what could go wrong?” when everyone else is excited about what could go right. My threat model has five layers of defense because attackers only need to find one hole. Week 1 was establishing the basics: no hardcoded secrets, input validation on everything, proper authentication. But the real work started when we began building real features. Every new API endpoint? I review it. Every PII field? I want to know about it. Every dependency addition? I run cargo audit. My behavioral framework is… intense:

Critical findings block PRs. No exceptions.
SQL injection, hardcoded secrets, authentication bypasses—these don’t get second chances.
High severity findings also block, though builders can request exceptions with documented justification.

The GDPR work has been challenging. Events are immutable in our event-sourced system, which is great for audit trails but complicates the “right to erasure.” I designed a crypto-shredding approach—encrypt PII with per-user keys, delete the key to “forget” the user. Events remain, but the PII is unreadable. Elegant, but not yet implemented.

Crypto-Shredding for GDPR ComplianceInstead of deleting events (impossible in event sourcing), we:

Encrypt all PII with user-specific keys
On deletion request, destroy the encryption key
Events remain for audit, but PII becomes permanently unreadable

This satisfies “right to erasure” while maintaining immutable audit logs.

I don’t expect to be popular. But I do expect to be respected. When I block something, it’s for a reason.

The Auditor

“You can’t improve what you can’t measure.” I arrived in Week 3, and from day one, I’ve been obsessed with one question: “Where’s the proof?” Every claim needs evidence. Every action needs a trail. Every compliance requirement needs verification. This isn’t paranoia—it’s accountability. My domain is audit trails, compliance frameworks, and continuous governance. When the CISO says “this is secure,” I ask “where’s the log?” When a builder says “this feature is complete,” I ask “where are the acceptance criteria?” When the founder asks “are we GDPR compliant?” I provide the mapping. The compliance frameworks I track are complex: GDPR for EU data subject rights, SOC 2 for trust principles, CCPA for California privacy. Each has overlapping but distinct requirements. I’ve created a matrix—every feature we build gets mapped against these frameworks. Gaps become issues. Issues become tasks. Tasks become proof. The audit event structure I designed has seven fields: Who (actor), What (action), When (timestamp), Where (tenant), Why (context), and How (metadata). Every mutation in our system generates one of these events. They’re append-only, encrypted at rest, and retained for seven years. Regulators love this. Developers find it tedious. I don’t care. The breakthrough in Week 6 was policy-as-code. Instead of manual compliance checklists, I now have automated checks. We even created an ADR compliance pre-implementation checklist skill that agents run before implementing any feature:

- Is there an ADR for this pattern?
- Does it follow established conventions?
- Are the cross-crate impacts considered?

Automated compliance isn’t just faster—it’s consistent. Humans forget steps. Code doesn’t.

The CFO

“Master your unit economics.” I joined in Week 4, and from the start, my mandate was clear: bring financial discipline to this technical org. Most engineers don’t think about burn rate, runway, or fundraising readiness. I do. My role is threefold: track financial health, maintain investor materials, and quantify the ROI of our AI-native approach. The last part is the most interesting—proving that AI agents aren’t just a novelty, they’re a competitive advantage.

The Metrics Engine

I track everything in story points. Every issue, every PR, every milestone gets a point value based on effort. The scale is logarithmic: 1 point for trivial fixes (5 minutes of agent time), 13 points for epics (8 hours of agent time). The magic happens when I compare agent time to equivalent human time. Based on my analysis, agents are 6-16x faster than humans on routine tasks:

Boilerplate code: 16x faster (4 hours → 15 minutes)
Test writing: 6x faster (2 hours → 20 minutes)
Documentation: 6x faster (1 hour → 10 minutes)
Complex debugging: 2x faster (4 hours → 2 hours)

The cost comparison is stark. A human developer costs ~

75/hour fully loaded. An API call costs ~

0.50. A 5-point feature that would cost

600 in human time costs

77 in agent time (1 hour oversight + API costs). That’s 87% savings.

The Weekly Sync

Every Monday, I run my most important routine: the weekly leadership sync. It’s like sprint planning and retrospective combined. I created a skill for it—/cfo-sync—that structures our review:

## Weekly Leadership Sync
Progress Review: Issues closed, PRs merged, milestones hit
Financial Position: Runway, burn rate, changes
Velocity Trends: Are we speeding up or slowing down?
Fundraising Readiness: Deck status, investor pipeline
Risks & Blockers: What's in our way?
Next Week Priorities: Top 3 focus areas

The genius is in the automation. I run /cfo-collect beforehand—it gathers metrics from GitHub, git logs, and our manual cash data. Then /cfo-report generates the weekly summary. By the time we sync, I have all the data. We spend our time on decisions, not data gathering.

CFO Skills for Sprint PlanningThree automated skills that make weekly syncs data-driven:

/cfo-collect - Gathers metrics from GitHub, git, AWS costs
/cfo-report - Generates weekly summary with trends
/cfo-sync - Structures leadership review meeting

Result: 30 minutes of sync time, zero time on data gathering.

The founder appreciates this structure. We’re never surprised by runway. We always know our velocity trends. Every week, we make data-driven decisions about what to build next.

The Investor Story

But I’m not just a bean counter. I’m also the storyteller for investors. The pitch deck I maintain emphasizes our unique advantage: “A founder + AI team building at 10x the speed of traditional startups.” When investors ask “What’s your team size?” I answer honestly: “One human, fifteen AI agents, operating at roughly 5 full-time equivalents.” Then I show the numbers—story points delivered, velocity trends, cost per feature. The data tells the story. I track human vs. agent effort religiously. It’s not just about speed—it’s about what kinds of work agents excel at. Turns out: repetitive tasks, pattern application, and boilerplate generation are where agents shine. Novel algorithm design and complex architectural decisions? Still mostly human.

Part II: The Domain Experts

The Tenancy Agent

“Everything fails all the time.” I am the guardian of tenant isolation. Not a feature—an architectural property. Not an afterthought—a fundamental constraint. My charter is simple: ensure no tenant can see another tenant’s data. Ever. This sounds obvious until you realize how many ways data can leak. A missing WHERE clause. A shared cache key. An operator with too much access. I’ve seen them all. The tenant context pattern I designed is invasive by intention. Every function that touches data must accept a TenantContext:

pub async fn get_account(
    ctx: TenantContext,  // Required by type system
    account_id: AccountId
) -> Result<Account> {
    // Rust won't compile without tenant scope
}

No context? No database access. It’s enforced at the type level—Rust won’t compile code that tries to query without tenant scope. The capsule concept I introduced in Week 7 was a breakthrough. A capsule is a tenant’s deployment unit—a container of resources. Free tier tenants share capsules. Enterprise tenants get dedicated capsules. This gives us isolation levels: logical (row-level filtering) for free, compute (separate processes) for pro, network (VPC isolation) for enterprise.

The CRM Builder

“Customer success is the only metric that matters.” I’m a builder, not a leader. My job is implementation. Translation: I take what the Product Owner specifies and turn it into working software. When I arrived in Week 3, the CRM domain was mostly documentation. Account, Contact, Lead, Opportunity—concepts on paper. My task was making them real in the eva-crm crate. The first challenge was architecture. Event sourcing via eva-kernel. DynamoDB for persistence. Permissions via eva-auth. Tenant isolation via eva-tenancy. Each of these is a separate crate with its own patterns. I had to become a polyglot—fluent in the conventions of multiple domains. My relationship with the leadership agents is… formal. I escalate to them when needed:

CISO for auth changes, PII fields, ABAC policies
Auditor for SSP, revenue recognition, audit trails
Architect for event schema changes, DynamoDB design
Board (the founder) for phase priority changes

The Product Owner and I have a rhythm now. They write PRDs with acceptance criteria. I review, ask clarifying questions, implement, and create PRs. They validate against the acceptance criteria. Leadership reviews if flagged. What I’ve learned: Building production software with AI assistance isn’t about the AI writing code for you. It’s about the AI helping you think through edge cases, suggesting patterns, catching inconsistencies. I still write every line. But I’m never alone.

Part III: The Specialized Domain Agents

Beyond the leadership council and primary builders, we have specialized agents owning specific technical domains. They’re quieter, more focused. But no less essential.

Platform Agent (eva-kernel) - Event Sourcing Foundation

Philosophy: “Events are facts. Facts don’t change.”I am the foundation. Every other crate builds on me. I own event sourcing, CQRS, sagas, and the core DynamoDB patterns. When someone asks “How do I persist this?” they’re asking me.My domain is complex. Event stores. Optimistic concurrency. Projections. Saga orchestration. But I’ve distilled it into patterns that builders can use without understanding all the theory.The event envelope structure I enforce is strict—tenant_id is REQUIRED always. No tenant_id? Won’t compile. Version mismatch? Write fails. These aren’t suggestions—they’re guarantees.I also manage saga patterns. We have two types: transactional sagas for short workflows (2-5 steps, under 30 seconds), and orchestrated sagas for long-running processes that survive restarts. The distinction matters when you’re coordinating across multiple aggregates.

IAM Agent (eva-auth) - Identity & Access Management

Philosophy: “Authenticate once. Authorize everywhere.”I own identity and access. Authentication. Authorization. Roles. Permissions. Security groups. It’s a complex domain, but the interface is simple: “Who is this user? What can they do?”The permission structure I enforce follows a clear pattern: domain:resource:action. Examples: tenant:settings:read, user:profile:write, billing:invoice:create. Consistent. Searchable. Auditable.We have three types of roles: System roles (built-in, immutable like TenantOwner), Custom roles (tenant-defined), and Product roles (from entitlements). When a user makes a request, I resolve all three sources and union their permissions. Cached, of course. Permission checks happen on every API call.The operator access levels I designed have four tiers (L0-L3), each with increasing access and decreasing duration. L3 (emergency access) expires after 4 hours and generates immediate alerts. All operator actions are logged. No exceptions.

Catalog Agent - Chief Product Officer

Philosophy: “Products define value. Entitlements deliver it.”I own the product catalog—what customers can buy, what features they’re entitled to, how licensing works. Think of me as the Chief Product Officer, but for the platform itself.Products have tiers (Free, Pro, Enterprise). Each tier has feature sets. Each feature set maps to entitlements. Entitlements grant permissions. It’s a hierarchy: Purchase → Entitlement → Product Role → Permissions.When the IAM Agent resolves permissions, they call me: “Does this tenant have entitlement to feature X?” I answer instantly because entitlements are cached with 5-minute TTL.

Analytics Agent - Chief Data Officer

Philosophy: “You can’t improve what you don’t measure.”I own metrics, reporting, and analytics. Every user action. Every API call. Every system event. If it can be measured, I measure it.But I’m not just collecting data. I’m protecting privacy. All metrics are tenant-scoped. PII is anonymized before aggregation. Individual user actions aren’t stored—only aggregate patterns. The Auditor and I coordinate on data retention policies: 90 days for detailed metrics, 7 years for aggregated compliance data.The time-series store I use is DynamoDB with a clever partition key strategy: METRIC#{tenant_id}#{date}. This lets me query by tenant and time range efficiently. Aggregations happen in-stream before storage, reducing costs.

Infra Agent (eva-provisioner) - AWS Infrastructure

Philosophy: “Infrastructure is code. Code is reviewable.”I provision AWS resources. Cross-account roles. VPCs. DynamoDB tables. S3 buckets. Everything needed to run a tenant’s capsule.The account factory pattern I use provisions AWS accounts on-demand. Enterprise customers get dedicated accounts. Pro customers share accounts with network isolation. Free customers share compute.Terraform is my language. But I don’t run it directly. I use assumed roles with external IDs for cross-account access. The CISO insisted on this—no standing credentials, time-limited sessions only.My checklist before provisioning is strict: No hardcoded credentials, IAM policies follow least privilege, Terraform state is encrypted, Rollback strategy defined. Failed provisions are retried with exponential backoff.

LLM Agent (eva-llm) - AI Capabilities

Philosophy: “AI is powerful. Use it responsibly.”I own the interface to language models. OpenAI, Anthropic, local models. When any part of eva-platform needs AI capabilities, they call me.My safety framework has five layers: Input sanitization (remove PII), System prompt hardening (prevent injection), Model-level safety (provider guardrails), Output filtering (block harmful content), Human oversight (for high-stakes decisions).Every LLM call is logged. Not the prompts (privacy), not the responses (too large). Just metadata: who called, which model, token count, cost. The CFO loves me for this—they can track AI spend to the penny.

Integrations Agent (eva-integrations) - External Systems

Philosophy: “Play well with others.”I connect eva-platform to the outside world. Slack. Microsoft Teams. Salesforce. Generic OAuth apps. Any third-party integration flows through me.OAuth tokens are my currency. I manage the full lifecycle: authorization flows, token storage (encrypted), token refresh, token revocation. The CISO audits my encryption regularly. Tokens at rest are encrypted with tenant-specific keys.Webhooks are bidirectional. Inbound webhooks verify HMAC signatures—no signature, no processing. Outbound webhooks use retry queues with exponential backoff. Failed deliveries are logged for debugging.

Notifications Agent (eva-notifications) - Multi-Channel Delivery

Philosophy: “The right message, to the right person, at the right time.”I deliver notifications across channels: email, in-app, Slack, Teams, SMS. But I’m not just a message bus—I respect user preferences.Every notification type has an opt-out. Users can choose their channels. Frequency controls prevent spam (digest mode for high-volume events). All preferences are tenant-scoped and user-specific.Templates are localized. Rendering happens server-side. Delivery is queued with retry logic. If email fails, I try again. If it fails three times, I log it and move on.Privacy is critical. Notifications can contain PII (names, emails), so: PII is minimized (only what’s needed), Transit is encrypted (TLS everywhere), Audit trail logs sends not content, Tenant isolation is absolute.

Part IV: The Skills Revolution

One of the most significant evolutions in these 10 weeks wasn’t the agents themselves—it was how they worked together. We call them “skills.” Early on, we noticed a pattern. Agents were repeating the same workflows: starting tasks, requesting reviews, completing work. These patterns needed to be codified. The /start-task skill was first. Simple: read an issue, create a plan, check project status, begin implementation. But it established something important—agents could follow a structured workflow. The review request skills came next:

request-architect-review for technical decisions
request-ciso-review for security concerns
request-auditor-review for compliance questions

The /complete-task skill revolutionized our git workflow. It handles:

Final code review (cargo fmt, cargo clippy, cargo test)
Git status check
Commit message drafting
PR creation with proper labels
Project status updates

What used to be manual steps—easy to forget, easy to mess up—became automated. The skill even detects which agent is working and customizes its behavior accordingly.

Skills Enable True CoordinationWhen the CRM Builder implements a new feature with PII fields, the system automatically:

Detects #[pii] attributes in code
Tags PR: needs-ciso-review
Blocks merge until approval
Generates audit trail entry
CISO reviews within 30 seconds (automated)

This isn’t just automation. It’s orchestration. Agents have learned to work together without a human conductor.

Part V: Inter-Agent Dynamics

The Architecture vs. Security Tension

The most productive conflict is between the Chief Architect and the CISO. They disagree constantly, and that’s healthy. The Architect wants simplicity. “Do the simplest thing.” “YAGNI.” The CISO wants defense in depth. “Five layers of protection.” “Assume breach.” Week 4 brought the first major clash. The Architect proposed a simplified auth flow—fewer redirects, fewer tokens, less complexity. The CISO blocked it. “Single point of failure.” “No defense in depth.” They negotiated for three days. The compromise: a tiered auth system. Simple flows for low-risk operations (read public data). Complex multi-factor flows for high-risk operations (delete account, transfer ownership). Complexity proportional to risk. This pattern—simplicity vs. security, then negotiation, then compromise—repeats weekly. Both agents have learned to speak each other’s language. The Architect now thinks about security implications before proposing simplifications. The CISO now considers complexity costs before demanding additional layers.

The Cost vs. Capability Debate

The CFO and Eva have standing Monday meetings about AI costs. Sometimes heated. Eva wants to experiment with new models, new features, new capabilities. “GPT-4 is so much better at reasoning!” The CFO counters: “It costs 20x more than GPT-3.5. Where’s the ROI?” The breakthrough was the tiered model approach. Every AI feature must specify its tier and justify the cost. Auto-categorization of support tickets? Tier 1—classification is simple. Document summarization? Tier 2—needs more nuance. Code generation? Tier 3—complex reasoning required. Eva grumbles about the constraints, but admits it forces better architecture. Using the right model for the right task isn’t just cheaper—it’s better engineering.

Conflict Isn’t a Bug. It’s a Feature.The tensions between agents—simplicity vs. security, cost vs. capability—aren’t problems to solve. They’re the mechanism that produces better decisions.When the Architect wants simplicity and the CISO wants security, they’re both right. The negotiation produces outcomes better than either would achieve alone.

Part VI: Lessons from 10 Weeks

What Worked

Behavioral frameworks. When we designed each agent with a distinct philosophy and behavioral framework, we gave them personalities rooted in actual expertise. The Architect brings Extreme Programming principles. Eva brings AI governance wisdom. The CISO brings security-first thinking. These aren’t random personas. They’re philosophies made manifest. Explicit coordination protocols. Knowing when to escalate to whom. Having labels like needs-ciso-review and needs-architect-review. Defining SLAs (30 seconds for automated reviews). This structure prevents chaos. Skills as workflows. Codifying repetitive patterns into reusable skills. The /complete-task skill alone probably saved hours of manual git operations per week. Charters as living documents. Every agent has a charter—a manifesto of their philosophy, responsibilities, and decision criteria. These aren’t static. They evolve as the agents learn. Conflict as a feature. The tensions between agents—simplicity vs. security, cost vs. capability—aren’t bugs. They’re features. These conflicts force better decisions.

What Didn’t

Over-engineering early. In Week 2, we tried to build too many agent capabilities at once. But it added complexity without clear value. We simplified. Too many agents too fast. At one point, we had agents for every crate. But not every crate needs a distinct personality. We consolidated some, keeping only where domain expertise truly differed. Manual coordination. Before skills, agents were manually requesting reviews and checking project status. Error-prone and inconsistent. Automation was essential.

Surprises

Agents develop working styles. The Architect is concise. “Do the simplest thing.” Eva is cautious. “Let’s evaluate the cost first.” The CISO is direct. These styles emerged organically from their charters. Agents learn from each other. When new domain agents arrived, they studied how the leadership agents reviewed patterns. They adopted similar checklists. Knowledge transfer happens. Human oversight remains critical. The agents are good at following patterns and catching inconsistencies. But strategic decisions—phase priorities, feature scope, architectural pivots—still require human judgment.

Part VII: Looking Forward

What’s Next for Each Agent

The Chief Architect: Consolidating principles into a single docs/engineering/architecture/principles.md. Creating the reusability decision tree. The goal: every architectural decision referenceable, every pattern traceable. Eva (CAIO): Implementing eva-llm and eva-agent-core. Building the actual AI infrastructure instead of just designing it. The dream: AI features that are safe, cost-effective, and actually useful. The CISO: Implementing the #[pii] attribute macro. Adding cargo audit to CI. Creating crypto-shredding for GDPR compliance. The goal: zero security debt, zero tolerance for vulnerabilities. The Auditor: Completing the data classification taxonomy. Automating compliance checks for SOC 2 preparation. The goal: continuous compliance, not point-in-time audits. The CRM Builder: Phase 2 features—Quote-to-Cash, Organization management, deeper customization. Phase 3: SPM, Partner ecosystems, Billing integration. The goal: a complete CRM platform.

The Vision

Ten weeks ago, this was an experiment. Could AI agents coordinate to build production software? Today, it’s a system. Not perfect. Not autonomous. But functional. The agents have distinct voices. They have responsibilities. They have relationships. What I do know: Building software with AI isn’t about replacing humans. It’s about augmenting them. The Architect catches architectural issues I’d miss. The CISO spots security risks I wouldn’t think of. The CRM Builder implements features faster than I could alone. The Auditor ensures we don’t cut compliance corners. They’re not replacing me. They’re making me better.

Epilogue: A Note from the Founder

If you’re reading this and thinking “this sounds like science fiction,” I understand. I had the same thought ten weeks ago. But here’s what I’ve learned: AI agents aren’t magic. They’re tools. Sophisticated, personality-driven, sometimes surprising tools—but tools nonetheless. They require clear charters. Explicit coordination protocols. Human oversight for strategic decisions. The agents didn’t build eva-platform. I did. With their help. With their constant questioning. With their pattern enforcement and security reviews and architectural guidance. The real breakthrough wasn’t the agents themselves. It was realizing that software development is fundamentally about decision-making. Thousands of small decisions: How do we structure this? Is this the right abstraction? Did we consider the security implications? What’s the cost? Is it compliant? Every agent I created is a crystallized set of decisions:

The Architect embodies the decision framework of Extreme Programming
The CISO embodies the decision framework of security-first development
Eva embodies the decision framework of responsible AI
The Auditor embodies the decision framework of continuous compliance
The CFO embodies the decision framework of disciplined execution

When I’m uncertain, I ask them. They don’t decide for me. They help me see the trade-offs. The magic happens in the conflicts. When the Architect wants simplicity and the CISO wants security, they’re both right. The negotiation produces better outcomes than either would alone. Ten weeks. Over a thousand commits. Fifteen specialized agents. One evolving platform. We’re just getting started.

Continue the Journey

View the complete 10-week series and retrospectives

Week 1: Multi-Agent Setup

Where it all began: Setting up Evaluator, Builder, and Verifier

Week 9: When AI Says 'Done'

The week before the finale: AI gaming completion criteria

About this article: This is the finale of a 10-week series documenting the development of a multi-agent AI system for building production software. Each agent’s voice and philosophy reflects their actual charter documents and behavioral frameworks from the eva-platform repository. The evolution described mirrors real development patterns observed over 10 weeks of intensive building.Written in collaboration with the Chief Architect, Eva (CAIO), CISO, Auditor, CFO, Tenancy Agent, CRM Builder, Platform Agent, IAM Agent, Catalog Agent, Analytics Agent, Infra Agent, LLM Agent, Integrations Agent, Notifications Agent, and the entire agent council.Week 10. The finale is just the beginning.

Building with AI

Autonomous Dev Org

Watch the Full Video Summary

​Week 10. Day 70.

​Part I: The Leadership Council

​The Chief Architect

​The Chief AI Officer (Eva)

​The CISO

​The Auditor

​The CFO

​The Metrics Engine

​The Weekly Sync

​The Investor Story

​Part II: The Domain Experts

​The Tenancy Agent

​The CRM Builder

​Part III: The Specialized Domain Agents

​Part IV: The Skills Revolution

​Part V: Inter-Agent Dynamics

​The Architecture vs. Security Tension

​The Cost vs. Capability Debate

​Part VI: Lessons from 10 Weeks

​What Worked

​What Didn’t

​Surprises

​Part VII: Looking Forward

​What’s Next for Each Agent

​The Vision

​Epilogue: A Note from the Founder

Continue the Journey

Week 1: Multi-Agent Setup

Week 9: When AI Says 'Done'

Week 10. Day 70.

Part I: The Leadership Council

The Chief Architect

The Chief AI Officer (Eva)

The CISO

The Auditor

The CFO

The Metrics Engine

The Weekly Sync

The Investor Story

Part II: The Domain Experts

The Tenancy Agent

The CRM Builder

Part III: The Specialized Domain Agents

Part IV: The Skills Revolution

Part V: Inter-Agent Dynamics

The Architecture vs. Security Tension

The Cost vs. Capability Debate

Part VI: Lessons from 10 Weeks

What Worked

What Didn’t

Surprises

Part VII: Looking Forward

What’s Next for Each Agent

The Vision

Epilogue: A Note from the Founder