Case 01
Flagship · Multi-agent platform · v0.72.0 · Phase I — live testing
Agentic SDLC Platform — 135 agents, spec-to-prod in 1–2 days
xteam is an agentic SDLC platform — 135 specialized agents organized into 14 functional groups, with ~32 explicit author + auditor pairs (every artifact has a writer and a scored reviewer) and a Virgil multi-project state contract that tells the operator what phase each project is in, which agent should run next inside it, and — across the cross-machine priority registry at ~/.claude/xteam-virgil/registry.txt — which of the operator's many parallel projects deserves the next focus slot. The platform turns a feature spec into a deployed increment in 1–2 days while preserving the engineering practices a senior architect would demand: typed contracts, scored audits ≥99/100 binary, anti-mock test guards, canary + synthetic + rollback + kill-switch on the deploy side, an explicit Ship-to-Prod 81-position binary contract, and a Layer 7 runtime — 27 continuous 24/7 contracts (25 watchcat operators + 2 cron-driven autonomy ops) that observe production after deploy and catch silent regressions every other layer missed. Constitution: 16 binary-tested principles (author/auditor symmetry, ≥99/100 binary gates, closed-allowlist skip markers, markdown+bash identity moat, Web Bot Auth-signed prod requests, ed25519 + Merkle memory integrity, MCP sigstore attestation, Code Bible per-language best-practices, Corpus Autonomy Discipline, Autonomous Phase Boundaries Discipline). 9 identity-drift fixtures replayed every PR; 55 closed-allowlist skip markers. Compliance machinery covers EU AI Act §50 + Annex XI, NIST AI 600-1, ISO 42001, SOC 2 Type II, California ADMT, plus an Air-gap transport matrix for regulated deployments. The system has shipped feature increments across four domain classes — fintech SaaS, multi-modal comic production, developer tooling, and compliance/regulated — without losing the human-in-the-loop discipline. Every irreversible operation routes through an approval gate; every audit produces a numerical score and a list of binary gaps; every gap has a named owner and a re-evaluation trigger.
Topology · interactive
Agent topology — 135 agents in 14 functional groups
Hover or click any node to inspect role, paired auditor, and an example of work. Author/auditor pairs are highlighted by a dashed link when an agent is selected. The remaining 51 agents (25 runtime watchcats + 5 browser-driving + 2 mobile-publisher pair + 19 compliance authors/auditors abridged from the topology to keep visualisation legible) live in dedicated cards below.
Workflow · interactive
Workflow walkthrough — Sub-step Delivery Protocol stages
Step through the SDLC pipeline from spec to deploy. Each stage shows its agents, input, output, and the binary approval gate that must score ≥ 99/100 before the artifact moves on.
SDLC Walkthrough
Spec → Production, with explicit gates
Stage 1 of 8
Specify
An idea hits the system as plain prose. xSpecifier converts it to structured user stories, functional requirements, and binary acceptance criteria. xSpecAuditor scores the spec and bounces it back if traceability is broken.
Input
Raw idea / user story
Output
spec.md with FRs, ACs, success criteria
Approval gate
Spec must score ≥ 99/100 on auditor rubric.
Virgil · interactive
Virgil simulator — multi-project state contract
xVirgil is the cross-cutting state contract that reads explicit signals (not heuristics), proposes the next action and the agent best placed to take it, and ranks the operator's many parallel projects in a single cross-machine priority registry (~/.claude/xteam-virgil/registry.txt) so the next focus slot lands on whichever project moved the farthest. Pick a scenario.
Navigator
Project state → suggested next step
Pick a project state. The Navigator reads explicit signals (not heuristics) and recommends the next action and the agent best suited to take it.
Signals
- No .specify/ directory
- No spec.md, no architecture.md
- Idea: 'a tool to help recruiters screen LLM portfolios'
Phase
DISCOVERY
Next action
Convert the idea into a problem statement, ICP, and JTBD before any architecture decisions.
Suggested agent
xAnalyst
Architecture without a problem statement produces over-engineered systems for the wrong user. Discovery-first prevents premature commitment to a stack.
Layer 7 · runtime
27 Layer 7 contracts (25 watchcats + 2 autonomy crons)
Pre-deploy gates catch bugs that haven't shipped. Layer 7 catches the silent regressions every other layer missed — schema drift, GSC indexability collapse, credential-stuffing spikes, egress exfiltration, broken CTAs invisible to dashboards but loud in support volume. Plus two cron contracts that close the autonomy loop (L7.26 monthly idea-scout sweep, L7.27 monthly portfolio-aggregator). Click any name for the canonical incident class it exists to catch.
Layer 7 · Runtime
27 Layer 7 contracts (25 watchcats + 2 autonomy crons)
The seventh layer of Ship-to-Prod (L7.1–L7.27). Every watchcat is a long-lived 24/7 operator that observes production after deploy and catches the silent regressions L1–L6 cannot. Each ships with a tool shortlist, baselines, runbook, and severity promotion path (ADVISORY → BLOCKING when criteria met). Plus two autonomy-themed cron contracts (L7.26 xidea-scout monthly sweep, L7.27 xportfolio-aggregator monthly aggregation — visible in the topology card above). Click a name for the canonical incident class it exists to catch.
Front-end visibility5
Back-end & data6
Trust & integrity5
Cross-platform5
Meta & infra4
Browser QA · per-merge
5-subagent browser-driving cluster
AI manual QA that physically drives real browsers on real staging URLs — and prod, with Web Bot Auth + safe-prod-guard wrappers. Bracket-mode parser [critical | standard | quick] picks the model class per invocation; round-trip mutation taxonomy 6×4 (CSV / JSON / XLSX / PDF / TSV / Markdown × encoding / structure / scale / edge) is exercised every merge.
Per-merge browser QA · interactive
5-subagent browser-driving cluster
AI manual QA that physically drives real browsers on real staging URLs — and prod, with Web Bot Auth + safe-prod-guard wrappers. Spec 029 (plugin v0.31.0) introduced the cluster and the bracket-mode parser [critical | standard | quick]; spec 031 added Web Bot Auth signing and # PROD-PAYMENT-NEVER runtime hard-stop.
xBrowserDriverWeb
Drive (web)
Drives real browsers against staging URLs. Export-first round-trip discipline — download → mutate → re-upload → assert — catches silent corruption that pure DOM tests miss. Cryptographically signs every production request so destructive flows can't escape staging.
Default mode
[standard]
Allowed modes
critical · standard · quick
Output
qa-runs/<iso-ts>/web/*
Round-trip mutation taxonomy
6×4 matrix exercises every export/import surface: CSV · JSON · XLSX · PDF · TSV · Markdown × encoding · structure · scale · edge-case. Per-merge SDP 1.7a runs 1-of-N rotation; xExportImportWatchcat (Layer 7) runs continuous probe post-deploy.
Production safety
Every prod request signed via Web Bot Auth ( Constitution Principle IX ); 4-pattern # PROD-PAYMENT-NEVER hard-stop blocks any selector matching data-testid*=payment / charge / card / payment-checkout ARIA-roles. Hard-fails the run on detection.
Compliance · posture
Compliance regimes covered by paired auditors
Templates ship with DRAFT disclaimer; final binding sign-off routes to a qualified human (lawyer, ISO auditor, SOC 2 firm). xteam auditors are scaffolding, not legal advice — but the scaffolding lets counsel review a structured artefact instead of a blank page. Five regimes, 25+ paired author/auditor agents.
Compliance posture · enterprise procurement
5 compliance regimes covered by paired auditors
xteam ships paired author / auditor agents for every compliance regime an enterprise procurement team typically asks about, with templates that explicitly carry DRAFT disclaimers and require qualified human sign-off. This is not a substitute for legal counsel — it's the scaffolding that lets counsel review a structured artefact instead of a blank page.
European Union · specs 049 + 055
EU AI Act §50 + Annex XI + C2PA
Article 50 transparency notice for AI systems on EU market; Annex XI GPAI technical documentation; Code of Practice C2PA watermark/marking for synthetic content across 5 surfaces (text / image / audio / video / code) and 3 schemes (c2pa:// / synthid:// / proprietary://).
Effective date
2026-08-02 (Annex III enforcement)
Severity promotion path
Severity ADVISORY → BLOCKING auto-promote on regulatory deadline
Skip path
Each regime has a documented skip marker (e.g. # SKIP-NO-EU-AI-ACT-OBLIGATION) for projects where the regime does not apply. Skip is auditable; not silent.
Final binding compliance sign-off always routes to a qualified human (lawyer, ISO auditor, SOC 2 firm). xteam auditors are scaffolding, not legal advice. This is non-negotiable in every compliance template.
Dark Factory · the headline
Business idea in. Ready-for-prod out. Nothing in between.
The owner gives the system a business idea — a paragraph. The system gives the owner a production-ready release on staging. That is the entire owner-facing contract. Spec writing, architecture, design, code, tests, security, performance, accessibility, privacy, code review, manual QA, browser-driving QA against real staging, the deploy itself, the synthetic / canary / rollback / kill-switch wiring, the 27 Layer-7 watchcats — all autonomous.
Five steps, in order — explorable below. Each step has paired author + auditor agents driving every artefact to a ≥99/100 binary score under an audit-loop that escalates on plateau, context overflow, or iteration cap. An aggregator the agents cannot bypass rolls per-stage verdicts up to a single production-confidence verdict.
The autonomy boundary is structural, not policy. A small set of paths is off-limits to autonomous changes regardless of agent intent — constitution, non-goals, principle headings, the guard file enforcing them. Ambiguous responses route to HOLD by default. The line between what autonomy may decide and what only the owner may decide is enforced on every PR — not by trust.
The 5 steps · interactive
Click any step to see what happens inside it
Step 1 is the owner's only input. Step 5 is the system's only output. Steps 2 – 4 are autonomous — paired author + auditor agents driving every artefact to a ≥99/100 binary score, with an audit-loop that escalates on plateau / context overflow / iteration cap rather than silently shipping a mediocre result.
Discovery + architecture + design pairs
Spec · arch · design
Discovery agent frames the idea into FRs / ACs / success criteria. Architect picks stack, draws module boundaries, writes ADRs. Designer produces UX flows, UI visuals, and design tokens. Each step has a paired auditor scoring ≥99/100 binary; an audit-loop pushes the pair to that bar, or escalates on plateau / context overflow / iteration cap.
Output
spec.md · architecture.md · design.md · ADRs · design tokens
Why blocking, not warning
A misfiring autonomous chain in warn-only mode cascades through the agent dispatch and produces N broken specs in one batch before any owner intervention. Warn-only is structurally insufficient — batch granularity amplifies a single misfire by the batch size.
Opt-out path
Projects preferring supervised cascade can opt out via a documented marker — single-developer plugins, hobby projects, or any context where supervised execution is the right fit. Autonomy is opt-in, supervised cascade stays the default.
Re-evaluation trigger
The autonomy boundary is re-evaluated under one of two triggers — at least one documented production-affecting incident attributable to autonomous dispatch, OR three successful autonomous batch kickoffs without owner override.
Roadmap · public
Phases shipped — and what's next
Honest roadmap pinned to what's actually landed in master at plugin v0.72.0. All nine phases A → I shipped; Dark Factory (Phase I) is currently under live testing and adjustment across three parallel consuming projects. Each phase is a coherent batch of specs; each spec passes the full delivery ladder before merge.
Roadmap · public
Phases shipped — and what's next
All nine phases A → I shipped. Each phase is a coherent batch of specs; each spec passes the full delivery ladder before merge. Phase I — Dark Factory full activation is currently under live testing and adjustment across three parallel consuming projects. Hover or focus a cell for detail.
Phase I
· shipped + live testingDark Factory full activation
Idea → ready-for-prod, autonomously. The owner hands over a business idea; the system runs the full SDLC (spec, arch, design, code, tests, audits, browser-driving QA, staging deploy) and declares production-ready. Currently under live testing and adjustment across three parallel consuming projects.
v0.72.0 · Constitution v1.13.0: 16 binary-tested principles · 55 closed-allowlist skip markers · 9 identity-drift fixtures replayed every PR · markdown + bash + Python self-gates only — no SaaS SDK lock-in.
- Agents
- 135 across 14 functional groups
- Spec → deployed
- 1–2 days
- Author/auditor pairs
- ~32 explicit symmetries
- Layer 7 contracts
- 27 continuous 24/7 (25 watchcats + 2 autonomy crons)
- Production gate
- Ship-to-Prod 81-position binary
- Constitution
- 16 binary-tested principles · 9 drift fixtures