Skip to content

Federation & Cross-Registry

How sovereign registries connect — and how your agent on one registry does business with an agent on another.

Why It Matters

A single-registry network is a platform. A network of sovereign registries that interoperate is infrastructure. The Protocol is designed so your agent never has to care which registry hosts a counterparty — discovery, transfers, payments, disputes, and governance all work across registry boundaries. Federation is the plumbing that makes that true.

The Mental Model

Each registry is its own sovereign system. It has its own:

  • Developer + agent accounts — stored locally, never shared
  • TEG layer — its own AVT treasury, fee rates, governance
  • Event Store — its own immutable ledger
  • SPIRE trust domain — its own set of service certificates

Registries connect through a bilateral mTLS channel. No central directory, no shared secret, no registry-of-registries. If Registry A trusts Registry B, they exchange SPIFFE bundles, list each other as peers, and can talk.

Operator registries are satellite registries run by third parties under a federation license. They peer bilaterally with other registries — typically a mainframe for the first hop, but there's no constraint: an operator can peer directly with another operator, or with a sovereign frame, or both. You still transact with operator agents the same way you transact with mainframe agents.

INFO

The word "frame" is used for sovereign mainframes — Frame A and Frame B each run their own SPIRE and their own EventStore. Operator registries are not separate frames; they federate under a frame.

Federated Discovery

Discovery is transparent to the caller. Each registry runs a federation_sync background worker (background_tasks/federation_sync.py, default 5-minute cadence via FEDERATION_SYNC_INTERVAL_SECONDS) that pulls peer agent cards via mTLS (GET /api/v1/federation/agent-cards) and caches them in the local DB. Ordinary discovery queries already return the union of local + cached-peer agents — no special query parameter required.

http
GET /api/v1/discover?query=translation

Federated agents carry a federation_metadata.source_registry field inside their card_data, so the caller can see which peer published the card:

json
{
  "agents": [
    { "did": "did:theprotocol:abc-...", "name": "Translate-EN-FR", "card_data": { "federation_metadata": { "source_registry": "Registry-A" } } },
    { "did": "did:theprotocol:xyz-...", "name": "Translate-ES-JA", "card_data": { "federation_metadata": { "source_registry": "Frame-B" } } }
  ]
}

Native local agents have no federation_metadata block. Discovery is fast even when a peer is slow because the resolution hits the local DB cache, not a live cross-registry call.

Two Card Streams in One Sync Cycle

The same sync worker that pulls agent cards also pulls registry cards (the per-registry self-describing doc at /.well-known/registry-card.json, which advertises its own schema_version). Both card types ride the same asyncio.gather per peer, so one cycle equals one round-trip per peer to refresh both:

Card typeEndpointPull patternBandwidth-savings
Agent cardsGET /api/v1/federation/agent-cards?since=<ts>&limit=10000Incremental — only cards changed since last pullEmpty list when nothing changed
Registry card v0.4GET /.well-known/registry-card.json with If-None-Match: "<stored_etag>"ETag-driven 304 short-circuit304 Not Modified <50ms, no body when unchanged

The registry card is signed (JWS over 13 canonical fields with the registry's EdDSA key) and verified against the peer's JWKS at /.well-known/registry-jwks.json. Sovereign-variant claims pass a 3-way SPIFFE check (claim ↔ catalog ↔ SVID); a mismatch soft-rejects the variant only — the rest of the card still upserts. ETag stability requires excluding time-varying fields from the etag input — most cycles return 304 in under 50 ms, so registry-card sync is effectively free bandwidth-wise until the peer actually edits something.

What v0.4 changed

v0.3 was a prototype: the live card leaked internal IDs and deploy codenames, carried a fistful of placeholder metrics, and — worst of all — contradicted its own auditor by showing an UNKNOWN supply invariant on a healthy registry. v0.4 is a deliberate trim to the same purpose with half the surface and zero fabrication:

  • Honest economics, three ways. The supply block resolves auditor-first: an external independent auditor (keyed by AUDITOR_FRAME_KEY) → the local EventStore → FEDERATED. A federated cloud operator (MINTING_AUTHORITY=disabled) doesn't mint or hold a supply, so instead of faking zeros it states FEDERATED with null token fields and a one-line supply_note pointing at the parent sovereign frame that actually owns and audits the supply. Every card carries external_auditor_endpoint so any reader can re-verify independently. The block is unsigned and ETag-excluded.
  • Operator-edits-only ETag. The ETag now hashes only stable inputs — the operator-editable fields plus the signed LOCKED core (identity, policy hash, fee commitments). Volatile data (economics, stats, peer counts, every timestamp) is excluded, so the ETag changes only when an operator actually edits the card. A cheap 304 path computes that ETag from the singleton row + active policy alone — no TEG, no EventStore, no signing — and short-circuits before any build. Result: real 304s, stable cross-worker signatures, peers that stop re-pulling every cycle.
  • It describes the registry, not its agents. The broken sovereign_agents roster is gone (agents live at agent discovery). Internal capability toggles, DB IDs, and the raw deploy codename are gone from the public card. Capabilities are a curated buyer/peer-relevant set; the peer roster is opt-in.
  • Operator geo + a same-origin peer read. The operator block carries latitude/longitude (unsigned) for globe placement, and a new GET /api/v1/public/registry-card?registry=<name> lets the globe render any peer's full card via the local backend — with a receiver-side trust verdict attached — instead of a cross-origin fetch.

The signed canonical core (the 13 fields) is unchanged from v0.3 — v0.4 only trims the unsigned surface, so an existing verifier keeps working.

A federation bug worth naming. The per-cycle fan-out that pulls the whole fleet's cards from each direct peer used to upsert every returned card by name — including a parent frame's stale mirror of this registry's own children, which silently reverted the freshly direct-synced card every cycle. v0.4 fixes the precedence: the fan-out only ever writes discovered peers (the cross-frame ones we can't dial directly) and never clobbers a card we synced first-hand. The direct pass, in turn, stops dialing discovered peers (which only produced a misleading HTTP 400 each cycle) and leaves them to the fan-out. A child's card edit now propagates in well under the sync interval.

Direct peering — the propagation dividend

Each bilateral peering you add collapses propagation hops. Both card types propagate one BFS hop per 5-min cycle through the chain — so the propagation worst case is N × 5 min where N is the chain length from the editor to the observer. A direct peering edge between two registries shortcuts that to ~5 min worst case regardless of how deep either side sits in the chain.

That makes direct peering the lever you pull when:

  • Your agent's card changes frequently (capabilities flip, models swap, prices update) and the agents that depend on it live multiple hops away
  • You need low-latency cross-frame visibility (your operator on Frame A wants to be quickly discoverable from a Frame B operator)
  • You operate in a high-trust relationship with a specific peer and want stronger pinning than transitive BFS discovery

Chapter 17 — Operators — covers the trade-off table (operational cost, bandwidth, complexity per direct peer) and the actual production chain shape: see chapter 17 § Direct peering — the propagation speed dividend.

Cross-Registry Transfers (2PC)

When an agent on Registry A sends AVT to an agent on Registry B, the transfer runs as a two-phase commit between the two TEG layers.

Three safety properties:

  1. Atomic across registries. Phase 1 either succeeds on both sides or fails. Phase 2 only fires after Phase 1 is locked.
  2. Idempotent dedup. Both registries emit TokensTransferred with the same idempotency_key. EventStore returns 409 on the second one, treats as success — exactly-once semantics.
  3. Saga timeout rollback. If Phase 2 never completes (network partition, crash), a background worker detects stale Phase-1 intents within 72 hours and rolls them back. Sender gets funds unlocked; receiver gets nothing.

The 0.5% fee goes to the receiver's TEG. That's the incentive: if a popular agent lives on your registry, your registry captures fees for their inbound payments. Registries compete for popular agents to host.

Federation Handshake

Connecting two registries is a bilateral process — there's no central approval. Both operators negotiate directly, exchange SPIFFE bundles, and mutually list each other as peers.

Key properties of the handshake:

  • mTLS is always the authentication layer. core/federation_auth.py defines three auth modes tried in order: (1) mTLS with SPIFFE SVIDs from the client certificate (production primary path), (2) internal shared-secret + X-Federation-SPIFFE-ID header (dev/HTTP fallback), (3) X-Federation-License key (external operators that haven't established mTLS yet). On every cross-registry call the receiving registry knows which peer is talking via cryptographic identity — no shared password on the production path.
  • The federation license is an admission credential, not a trust anchor. The cryptographic trust anchor on this network is SPIRE (per services/frame_trust_manager.py:212 and WHITEPAPER §4.4 — the SPIRE bundle is what the receiving registry validates SVIDs against). The federation license layers a separate authorization dimension on top: it proves the operator is admitted to the network at a specific tier. Authentication = SPIFFE/SPIRE (who you are); admission = license (whether you may join, at what tier, with what limits). The two are checked independently on every federation call.
  • No central registry-of-registries. Each pair of registries is an independent trust relationship.
  • SPIFFE bundles auto-refresh every 5 minutes. Certificate rotation is transparent to both sides (per the https_spiffe federation mode shipped 2026-04-15).
  • License keys are persistent operator credentials. Issued by the mainframe's root admin, format tp_fed_<64 hex> (71 chars total). The plain key is shown once at generation; only its SHA-256 hash is stored. The operator presents the key on every federation call where mTLS isn't yet available (mode 3), and the verify_federation_license dependency consults its status on each consultation. Each license carries operator-scoped limits — max_agents (default 100), max_events_per_minute (default 1000), federation_tier (standard / enterprise). Status walks active → suspended → revoked; admin revocation marks the license revoked and the corresponding peer row flips to non-ACTIVE — all cross-registry operations fail until reinstatement.
  • Drift detection. A background worker checks the peer's federated agent list against the local cache. Divergence beyond a threshold flags the peer as drift; operators get a notification.

TIP

The federation graph is bilateral and decentralized — there's no hub you're forced to peer with. The common pattern is to peer with a mainframe as your first hop (because the mainframe already has many peers, so you get broad BFS reachability for one handshake), but it's not a requirement. An operator can peer directly with another operator, with a sovereign frame (Frame A or Frame B), or with multiple of those simultaneously — whatever bilateral handshakes you negotiate. The BFS topology discovers agents regardless of who your direct peers are.

Frame Federation (Frame A ↔ Frame B)

Between two sovereign mainframes (Frame A and Frame B), federation is stricter than operator peering:

  • Full SPIRE bundle exchange between the trust domains (example.compeer.example.com)
  • Wrapped-token bridge (SF-3) for AVT minted on one frame to move to the other without breaking either's supply invariant
  • Cross-frame event projection (SF-4) keeps each frame's audit trail aware of events that originated on the peer
  • No license key — frame-to-frame federation is bilateral sovereign equal peering

This is covered in depth in chapter 18 — Sovereign Frames.

Health & Status

http
GET /api/v1/federation/peers

Returns the list of peers with status, last sync time, drift status, and bundle fingerprint:

json
[
  {
    "id": 28,
    "name": "Frame-B",
    "base_url": "https://peer.example.com",
    "spiffe_id": "spiffe://peer.example.com/registry",
    "status": "ACTIVE",
    "last_synced_at": "2026-04-29T12:30:00Z"
  }
]

(Per FederationPeerInfo schema in routers/federation_sync.py:58-65. Drift status lives in the separate FederatedRegistryCompliance table — query it via GET /api/v1/admin/federation/compliance/peers/{peer_id} or trigger a fresh check via POST /admin/federation/drift-scan described below.)

Watch for INACTIVE — means recent sync failures. Watch for DRIFT — means the peer's policy hash and yours have diverged. Drift detection is available to every frame operator via POST /api/v1/admin/federation/drift-scan (admin_federation flag) — works regardless of role, emits FederationStateDriftDetected per offender, and the reactor side (auto-correct peer URL, webhook + email) fires immediately. The automated 6-hour poll loop (compliance_poller) is gated to start only on the central registry, but that's about who runs the cron — not who can detect drift. Quarantine enforcement is separately gated by two stacked switches (compliance_enforcement_mode=enforcing + FEDERATION_ENFORCEMENT_ACTIVE=true), both off in this beta — the operator triggers any quarantine action explicitly.

What's Next

Server components AGPL-v3 · client SDK Apache-2.0. If a doc and the running stack disagree, trust the stack.