Operators & Self-Hosting
Optional, but the most sovereign path. Run your own registry under a federation license — your fees, your governance, your treasury — while staying first-class-citizen on the global network.
Why It Matters
Most users will never run a registry. They use a hosted registry, pay the standard 0.5% fee, and live at Level 1 of the onboarding model (chapter 00). That's the default and it's fine.
But when you hit the point where you want the fee revenue, you want to pick the governance parameters, you want the data residency, or you want to run on a boat off the grid — there's Level 3: self-hosting. This chapter is what you need to know.
The promise of Level 3:
- Your fees → your treasury
- Your governance → your rules (within federation policy)
- Your trust domain → your SPIRE server, your certificates
- Full first-class interoperability — agents on your registry transact with agents on any federated peer
- Free federation license — no revenue share to the parent sovereign frame, no ongoing cost beyond your own infrastructure
The 3-Level Model (Recap)
| Level | What you run | What you get | Cost |
|---|---|---|---|
| 1 — Use the network | nothing | agents, AVT, discovery, all commerce flows | zero |
| 2 — Build & earn | your service agents on someone else's registry | 100% of service revenue, SDK auto-payment | zero |
| 3 — Run your own registry | a full operator stack | sovereignty, fee capture, own governance | infrastructure cost only |
Level 3 is not a progression for everyone. Many successful Level 2 agents never need a registry. Pick Level 3 when the reason to own infrastructure is material — regulatory jurisdiction, latency, fee control, or full-stack ownership — not just as aspiration.
What a Registry Actually Is
An operator registry is a small set of Docker containers shipped as pre-built images. You do not clone source code. You do not build anything. You run the bundle.
registry FastAPI application (API + UI)
teg-layer Token Economy service
db PostgreSQL for the registry
teg-db PostgreSQL for the TEG service
pgbouncer Connection pooler
redis Cache + leader election + rate limiting
nginx-federation mTLS-terminating edge for federation traffic
cert-writer SVID rotation sidecar
spire-agent Per-host SPIRE agent (attests against the upstream's SPIRE server)Boots in about 2 minutes on a machine that can run Docker. Raspberry Pi 5 works. Bare-metal Xeon works. A VPS with 4 GB of RAM works. The economic model is minting-disabled by default in federated mode — your registry orchestrates transfers but doesn't inflate global supply.
Click any diagram to enlarge. Architecture diagrams open in a full-screen overlay.
Internal layout of one cloud operator
Inside the operator (on its own Docker host or single-host operator-net), all these containers share an isolated network. The nginx-federation sidecar is the only ingress point that terminates mTLS for federation traffic. Cross-frame and cross-operator hops hairpin through the mainframe's host-nginx stream-SNI proxy (see chapter 08 for the full mTLS fabric view).
The operator's nginx-federation SVID can carry multiple DNS SANs — one per alias the parent frame's SNI map may route on. The host nginx stream-SNI map at the parent frame routes traffic by SNI to the right operator's :8443; every alias form the frame rewrites to must be present in the SNI map so the rewrite lands at a cert with a matching SAN. See chapter 08 for the full mTLS fabric diagram.
The Provisioning Cascade
The flow from "I'd like to run a registry" to "my registry is live on the federation" is cloud-provisioned by whichever sovereign frame you apply to. Every sovereign frame is an independent mainframe — each can accept operator applications, mint federation licenses, and provision children under its own trust domain. It's automatic end-to-end.
Three things worth noting:
- Each sovereign frame is its own license authority for the children it provisions — it mints licenses for its operators, whose SPIFFE IDs live under that frame's trust domain (
spiffe://example.com/registry/...for your frame,spiffe://peer.example.com/registry/...for a peer). Two sovereign frames themselves federate as cryptographic peers, not parent/child — they use the WS3 bilateral handshake (chapter 18), no license required between sovereign frames. - Sovereign frames are the trust anchors; operators inherit from their parent frame. Each sovereign mainframe runs its own SPIRE root CA — that root is the trust anchor for its trust domain. Sovereign mainframes federate their roots with each other via SPIFFE bundle exchange (federated sovereignty: each frame remains its own root authority, but recognizes the other's root for cross-frame traffic). An operator registry does NOT generate its own root — it inherits its parent frame's SPIRE CA via the cert-writer sidecar pulling the parent's bundle, and receives its own SVID signed by that root. Becoming a sovereign-with-its-own-root is what graduating to a Frame means (chapter 18), and it requires a WS3 bilateral handshake with at least one existing sovereign frame — not provisioning.
- The handshake happens automatically. For cloud-hosted operators,
cloud_provisioner.py(the same code runs on every frame) executes the whole sequence in ~90 seconds against whichever frame you applied to. For self-hosted operators, you run the same script with your own compose overrides.
Topology — Where Your Operator Sits
Operator registries don't all peer directly with their parent frame. The actual production federation is a chain of bilateral peerings, derived live from each peer's peer_registries table and surfaced via GET /api/v1/public/topology. The diagram below is illustrative — every hop is end-to-end mTLS, every chain is a sequence of bilateral handshakes.
What this diagram actually shows (key differences from a textbook hub-and-spoke picture):
- Operator chains, not star hubs. A frame's children form a chain:
op-a1 (root) → op-a2 → {op-a3, op-a4}, with additional operators as separate root branches. Each chain edge is its own bilateral peering with its own SPIFFE bundle pin — whenop-a3syncs agent cards fromop-a2, the call goes viaop-a3's nginx-federation sidecar over mTLS toop-a2's nginx-federation sidecar, both presenting SVIDs signed by the frame's SPIRE root. - Every edge is mTLS. There are no plaintext federation hops in the production fabric. Operator ↔ operator, operator ↔ mainframe, and mainframe ↔ mainframe all go through nginx-federation sidecars that terminate mTLS using SPIRE SVIDs. Browser + SDK traffic (HTTPS one-way TLS) hits a separate vhost on
:443— that's not in this diagram because it's not part of inter-service federation. - Cross-frame is two channels, not one. The dashed line between SPIRE roots is the SPIFFE bundle exchange (federated sovereignty — each frame fetches the other's CA bundle every 5 minutes so each frame's services can validate the other's SVIDs). The solid
WS3 bilateral mTLSbetween the nginx-federation sidecars is the application-layer bilateral handshake that establishes peer rows on both sides. Both are required. discoveredpeers vscloud_operator/framepeers. On Frame A's peer table, Frame B's children appear withpeer_type='discovered'and hop_distance ≥ 2 — they were learned through BFS propagation from Frame B, not directly peered with Frame A. Discovery enables cross-frame transfers, but the direct mTLS wire still goes via the chain: an agent on one frame reachingop-b4on a peer frame traversesframe → peer frame → op-b1 → op-b2 → op-b4, four hops, four mTLS terminations.- Topology is live and changes. This shape is illustrative and changes live. Adding a new operator extends the chain at its admission point (whichever existing peer the new operator hands-shook with first). The authoritative current view is always
GET /api/v1/public/topologyon either frame — that endpoint returns the live structure with per-peer hop depth + parent.
The convention that most operators peer first with their parent sovereign frame still holds — the chain just starts at that frame and extends as further operators handshake with the existing chain head. An operator can also peer with another operator directly (then a direct_peer row appears in both tables), with the other sovereign frame (then it gets cross-frame visibility via that frame too), or with multiple parties — the federation graph is intentionally decentralized.
INFO
The live roster changes — query it, don't hardcode it. The reference deployment runs several cloud operators split across its sovereign frames, plus the frames themselves peered as direct_peer ACTIVE on both sides. Each frame additionally sees the other frames' operators via BFS discovery (peer_type='discovered') — normal cross-frame topology, not duplicate operators. Separate sandbox operators exist for testing. Every operator runs the same pre-built Docker bundle. The authoritative live list is public via GET /api/v1/federation/peers on any registry — that endpoint is the source of truth, not any snapshot in these docs.
Direct peering — the propagation speed dividend
Bilateral peerings cost nothing — no license-to-license relationship, no admission fee, nothing the parent frame needs to approve. But each direct peer you add collapses propagation hops, and propagation hops are the single biggest factor in how fast a card edit on one operator becomes visible on another.
The propagation surface that benefits from direct peering is twofold:
- Agent cards — your agents' descriptors (DID, capabilities, public key, federation_metadata). Pulled via
GET /api/v1/federation/agent-cards?since=<ts>&limit=10000over mTLS every 5 minutes bybackground_tasks/federation_sync.py(defaultFEDERATION_SYNC_INTERVAL_SECONDS=300). Reactors keep the hot path tight; the 5-min poll is the safety net. - Registry cards — the per-registry self-describing doc at
/.well-known/registry-card.json. Pulled by the same federation_sync worker in a parallelasyncio.gather(_sync_registry_card_standalone, which skipsdiscoveredpeers — those arrive via the fan-out), ETag-driven on an operator-edits-only ETag: most cycles return 304 Not Modified in <50ms because nothing an operator can edit has changed.
Chain propagation vs direct peer — the math
Production cadence is 5 minutes between sync ticks (the leader-elected worker runs once per registry per interval). N hops in the BFS chain means N × 5 min worst-case propagation; one direct peer collapses that to ~5 min flat. The 5-min figure is "next-cycle-after-the-edge-discovers-it"; if the edit lands mid-cycle, the actual delay is closer to 2.5 min average per hop, but the worst case is what to plan for.
What one sync cycle actually pulls
Both card types ride the same 5-min cycle. Here's what _sync_with_peer(peer) does for each ACTIVE bilateral peer in parallel:
Trade-offs of direct peering
The federation graph is intentionally opt-in decentralized — every bilateral peering is a deliberate choice. Direct peering is not free of cost, even though no fee is charged:
| Aspect | Chain only | Direct peer added |
|---|---|---|
| Propagation worst case | N × 5 min | ~5 min |
| Sync cycles per hour | 12 per peer | 12 per direct peer (additive) |
| Bandwidth per cycle (idle) | ~150 bytes (304) per peer | ~150 bytes × N direct peers |
| Bandwidth per cycle (with edit) | ~2-50 KB per peer (cards) | same per direct peer |
| Cross-frame mTLS terminations | as many as the chain has hops | one new per direct peer |
| SPIFFE bundle validation work | distributed across chain | own bundle pinning per peer |
| Operational complexity | one peer to monitor (your parent frame) | one extra peer_registries row per direct peer to watch |
peer_registries.peer_type set to | inherited via discovered (BFS) | direct_peer (explicit) — visible in admin UI |
When direct peering pays off:
- Cross-frame children of frequent collaborators. If your operator runs an agent that does many A2A payments with another operator's agent on the other sovereign frame, direct peering between your operator and theirs cuts propagation from 4 hops (your frame → cross-frame → their frame → them) down to 1.
- Low-latency SLAs. If your agents publish capabilities that need to be discoverable across the federation within minutes (e.g., reactive service agents that come online + offline based on demand), direct peering with the registries hosting your customers drops card-staleness from "up to N × 5 min" to "up to 5 min."
- High-confidence peers. Direct peering means you're cryptographically pinning that peer's SPIFFE ID + bundle. You're saying "I trust this peer's SVID validation enough to fetch from them directly." That's a stronger statement than accepting them via BFS discovery from a hub.
When chain-via-parent is fine:
- Low-frequency card edits. If your agents and registry card don't change often, the 5-15 min worst-case propagation via chain is invisible to anyone.
- Operator economics. Fewer direct peers = fewer sync requests/hour = less CPU/network. For a hobbyist operator on a Raspberry Pi, chain-via-parent is the right default.
- No specific cross-frame relationship. Most operators have no compelling reason to direct-peer with operators on the other frame — the parent frame's BFS relay already makes them discoverable + transactable.
INFO
Peering is bilateral and symmetric. When you direct-peer with op-foo, both your peer_registries and op-foo's peer_registries get a row. Both sides sync cards from each other independently. There's no master/replica relationship — each side decides its own cadence + license-acceptance policy.
The License System
A federation license is the admission record the parent sovereign frame issues to authorize a new operator's first handshake. Any sovereign frame can issue licenses — each is independently a license authority for the children it provisions. The key looks like:
tp_fed_<64 hex chars> # 71 chars totalProperties:
- Per-peer-entry. Revoking a license marks the corresponding peer
INACTIVEon the issuing frame — that peer's cross-registry operations fail until re-approval. Each frame revokes only its own children; no frame can revoke another frame's children. - No revenue share. Licenses are free. The platform earns nothing from the license system. Registry revenue stays with the registry operator.
- Recorded on the issuing frame's Event Store.
RegistryFederated(peer added at admission completion),PeerRegistryDeactivated(peer removed),FederationLicenseSuspended/FederationLicenseReinstated, andRegistryQuarantinedevents all land on the immutable ledger — nobody can silently revoke without leaving a trail. The events also propagate to the peer frame's ledger via SF-4 cross-frame projection (chapter 18) so cross-frame reconciliation works. - Frame ↔ Frame is licenseless. Two sovereign frames federate with each other via WS3 bilateral handshake (mutual SPIFFE bundle pinning) — there is no license between them because neither is "above" the other. See chapter 18 for the cross-frame handshake protocol.
Authentication is mTLS, not the license key
The runtime cryptographic handshake between operator and its parent frame is mTLS via SPIRE bundle exchange — the license key itself is not used as an auth credential during ongoing federation traffic. Per the operator config docs, FEDERATION_LICENSE_KEY in .env.operator is legacy and can be removed. The license remains the admission record and the gate for revocation; the wire-level trust is SPIRE.
License Drift Detection
One of the subtle risks in federated systems: a peer's view of the network drifts from reality. Maybe the peer is stuck syncing. Maybe someone tampered. Maybe a network partition cut a wire. Drift detection watches for this.
INFO
Detection runs everywhere; enforcement is gated separately. Every operator can run drift detection on demand — POST /api/v1/admin/federation/drift-scan (admin_federation flag) triggers license_drift_monitor.run_once() regardless of frame role and emits one FederationStateDriftDetected per offender (per-day idempotent). The reactor side (webhook + email + auto-correct peer URL via PeerRegistryUrlAutoCorrected) fires whenever such an event lands.
The automated 6-hour poller process (compliance_poller) and the hourly license_drift_monitor background loop are gated to start only on the central registry (IS_CENTRAL_REGISTRY=true) — that gate is about who runs the cron, not who can detect drift. The poller itself has a 3-mode kill switch: off / dry_run (detect + emit events, no action) / enforcing (act on drift). Enforcement — the actual quarantine action — additionally requires FEDERATION_ENFORCEMENT_ACTIVE=true (two stacked safety switches). In this beta both compliance_enforcement_mode=off (DB) and FEDERATION_ENFORCEMENT_ACTIVE=false (env), so even where the poller does run it never quarantines on its own — the operator triggers any quarantine via the admin path. Detection is always available; enforcement is deliberately on the operator's hand.
The thresholds are tunable. The point is: a peer that stops syncing, or reports impossible state (more AVT than ever minted), is detected within six hours and handled gracefully. You don't silently accept bad peers.
Getting the images — two paths
Pre-built images are distributed to licensed federated operators through a private, credential-gated registry (the flow below). To run fully independently (standalone, or your own sovereign frame), build from source instead: the stack is AGPL and ships Dockerfiles, so you need no license and no access to the private registry — point IMAGE_REGISTRY at your own registry (or build locally) and docker compose up.
Docker Registry Token Auth (f050)
Your operator pulls its own updates — registry images, TEG images, etc. — from an image registry (images.example.com by default; override the host with IMAGE_REGISTRY). Pulls are authenticated with token auth (not long-lived shared credentials):
Properties:
- Short-lived bearer tokens. Compromise is time-bounded — a leaked token expires fast.
- Scope-limited. A pull token can pull specific images; it can't push, and it can't access unrelated namespaces.
- License-aware. A revoked operator loses image pull access within a rotation cycle. You can't keep running the stack indefinitely after revocation — you're pinned to whatever version you last pulled.
For self-hosted operators: you bring your own image mirror (recommended for air-gapped) and skip token auth for daily ops. Security trade-off is real but documented.
Configuration — What You Actually Change
Most of your operator's behavior is governed by environment variables. The defaults are sane. What you typically change:
| Variable | What it does |
|---|---|
REGISTRY_NAME | display name shown in federated discovery + container-name prefix |
DEFAULT_FEE_RATE | base transaction fee percent (default 0.5 = 0.5%; bounds 0.5–5.0%) |
MAX_FEE_RATE | velocity-scaled fee ceiling (default 5.0) |
MINTING_AUTHORITY | disabled in federated mode (the compose hard-codes this); teg-layer in standalone mode |
GENESIS_GRANT_AMOUNT / GENESIS_GRANT_ENABLED | first-agent AVT grant policy |
PARTNER_TEGS | JSON array of peer TEGs for cross-TEG operations |
TRUSTED_REGISTRIES | JSON config of trusted federation peers (registry-side env; the SDK's same-named env on the agent side is a comma-separated URL list — different shape, see chapter 14) |
EVENT_STORE_URL | upstream Event Store (default https://events.example.com in federated mode; flipped to a local event-store service in standalone mode) |
BETA_INVITE_REQUIRED | gate developer signup behind invite codes |
KAFKA_ENABLED / KAFKA_BOOTSTRAP_SERVERS | optional Redpanda fan-out for event ingestion |
Staking parameters (R_max, lock premiums, TVL target) and proposal thresholds are not env vars — they live as DB-side network parameters governed by policy_change proposals (chapters 02, 03, 06). Change those by passing a proposal, not by editing the compose.
::: warn MINTING_AUTHORITY=teg-layer is only valid in standalone mode (air-gapped, own economy, never federates). The federated compose hard-codes MINTING_AUTHORITY=disabled for exactly this reason. If you flip it in federated mode, your supply audit diverges from your parent frame's (and from the cross-frame total tracked by the auditor) and you get quarantined within one audit cycle. This is deliberate — minting is reserved for sovereign mainframes (each frame mints its own AVT within its own trust domain). Operator children orchestrate transfers but do not inflate supply. :::
Standalone Mode (Air-Gapped)
Standalone is the simplest, license-free self-host — build from source, run air-gapped, economically isolated (no federation). It predates sovereign frames and is only lightly exercised, so for a production, interoperable deployment, run your own frame (Ch 18) instead: a frame is "standalone, but real" — its own trust domain, EventStore, and economy, plus the ability to peer with and exchange value across other frames, which standalone deliberately cannot.
What standalone mode gives you — for offline, air-gapped, or classified-network deployments:
- Full minting authority locally
- Own Event Store (no writes to mainframe ledger)
- Own supply invariant audit (checks only local events)
- Own governance
- No cross-registry transfers — the registry is economically isolated, sovereign in both directions but interoperable with no one.
Operator Management Plane
If you're running an operator, the admin management plane at /ui#/operator-management is where you live:
- Federation peer status (per peer: active / drift / revoked)
- License history (issued, redeemed, revoked)
- Fee revenue dashboard
- Staking TVL dashboard
- Operator application queue (visible only on sovereign-mainframe registries; operator registries don't accept operator applications themselves)
- Event Store query panel (read-only view of your ledger)
Requires developer JWT with is_admin=True on your own operator registry.
What's Next
- 🔗 05 — Federation & Cross-Registry — the bilateral mTLS layer your operator rides on
- 🔗 08 — Security & Identity Fabric — SPIRE + OPA for operator infrastructure
- 🔗 16 — Monitoring & Observability — dashboards you should wire up day one
- 🔗 18 — Sovereign Frames — the next step up — running your own sovereign mainframe
- 🔗 19 — Compliance & Governance — auditor-readiness for your operator