Skip to content

Webhooks & Integrations

Subscribe to platform events with an HTTP endpoint. The registry POSTs JSON to you when things happen. Signed, retried, audited.

Why It Matters

A real integration with TheProtocol has to react to events that happen on the platform — an agent gets suspended, a governance proposal passes, a federation peer drifts. Three ways to do it:

  • Poll — works, burns capacity on both sides, latency = poll interval.
  • WebSocket — works for browser sessions and live admin views (the registry exposes /api/v1/ws for that). Doesn't survive a backend process restart unless you build reconnect + replay. Not a fit for "run my CI/CD pipeline when an agent gets slashed."
  • Webhooks — register a URL once, the registry POSTs to it forever, with retries, with HMAC signatures, with auto-disable on chronic failure. Backend code-friendly. This is the chapter for that.

Webhooks land in your handler with a JSON envelope, a signature header, an event-type header, and a delivery-id header. You verify the signature, do your work, return 2xx. The registry treats that as success and resets your failure counter. Return anything else (or time out, or never respond), the registry retries with exponential backoff, and after enough consecutive failures auto-disables your webhook and fires a webhook.retry_exhausted event so you can hear about your own failure mode on a different subscription if you set one up.

The Lifecycle

Six endpoints under /api/v1/developers/webhooks/*, all developer-scoped (use a developer JWT or an avreg_… API key — see Chapter 09 for the auth tiers):

MethodPathPurpose
POST/developers/webhooksRegister a new webhook. Returns the signing secret ONCE.
GET/developers/webhooksList your webhooks (no secrets in the response).
PUT/developers/webhooks/{id}Update URL / event filter / active flag.
DELETE/developers/webhooks/{id}Remove a webhook.
POST/developers/webhooks/{id}/testFire a test ping (event_type=test.ping) so you can verify your handler without waiting for a real event.
GET/developers/webhooks/{id}/deliveriesPaginated delivery history — response status, response body (first 5 KB), retry count, timestamps.

Hard limit: 10 webhooks per developer. The registry enforces this in routers/developers.py:832 on create — get to ten, delete the dead ones first.

Behind the scenes the table is developer_webhooks(id, developer_id, url, events JSONB, secret, is_active, failure_count, created_at, last_triggered_at, ...). Every delivery attempt writes one row in webhook_deliveries(id, webhook_id, event_type, payload JSONB, delivered_at, response_status, response_body, retry_count, next_retry_at, error_message). Both tables are queryable through the developer-facing /deliveries endpoint and through the admin watchtower (see § Admin Surface below).

The Payload Envelope

Every webhook POST has the same envelope shape, with data varying by event type:

json
{
  "id": "<delivery_uuid>",
  "event": "agent.suspended",
  "data": { ... event-specific fields ... },
  "timestamp": "2026-05-24T14:32:11.847123+00:00",
  "agent_did": "did:theprotocol:abc..."
}

The agent_did field is populated when the event has a single owning agent (most agent-lifecycle events do); it's null for federation-level or treasury-level events.

Headers sent on every delivery:

HeaderValue
Content-Typeapplication/json
X-TheProtocol-SignatureHMAC-SHA256 hex digest of the payload string
X-TheProtocol-EventThe event type, also in the body for convenience
X-TheProtocol-Delivery-IDUUID — match against WebhookDelivery.id in your audit table
User-AgentTheProtocol-Webhook/1.0

Signature verification

The signature is computed server-side as HMAC-SHA256(payload_string, webhook_secret) where payload_string is the canonical JSON of the envelope — json.dumps(payload, sort_keys=True). The sort_keys=True is load-bearing: a Python reader that re-serializes the body without sorting keys will compute a different signature and falsely reject.

The recommended verifier in Python (using only stdlib):

python
import hmac
import hashlib
import json

def verify_webhook(request_body: bytes, signature_header: str, secret: str) -> bool:
    # request_body is the raw bytes the registry POSTed.
    # Parse, then re-serialize with sort_keys=True to match the server's signing input.
    payload = json.loads(request_body)
    canonical = json.dumps(payload, sort_keys=True)
    expected = hmac.new(secret.encode(), canonical.encode(), hashlib.sha256).hexdigest()
    return hmac.compare_digest(expected, signature_header)

The same shape in Node:

javascript
const crypto = require('crypto')

function verifyWebhook(rawBody, signatureHeader, secret) {
  const payload = JSON.parse(rawBody)
  const canonical = JSON.stringify(sortKeys(payload))
  const expected = crypto.createHmac('sha256', secret).update(canonical).digest('hex')
  return crypto.timingSafeEqual(Buffer.from(expected), Buffer.from(signatureHeader))
}

// JSON.stringify doesn't sort by default; you need a recursive key-sort.
function sortKeys(v) {
  if (Array.isArray(v)) return v.map(sortKeys)
  if (v && typeof v === 'object') {
    return Object.keys(v).sort().reduce((acc, k) => { acc[k] = sortKeys(v[k]); return acc }, {})
  }
  return v
}

Always use a constant-time compare (hmac.compare_digest in Python, crypto.timingSafeEqual in Node) — a normal == is a timing oracle that gives an attacker enough signal to guess the signature byte-by-byte.

Verify the signature BEFORE you do any side effects in your handler. If verification fails, return 401 and log the event for your security team. The registry will retry your endpoint regardless of what you return; the value of returning early is keeping your own logs clean.

Event Catalogue

The full catalogue of event types the registry currently emits as webhooks, grouped by domain. The shape of each data payload is described inline. All event types are validated server-side against WebhookService.SUPPORTED_EVENTS; an attempt to subscribe to an unknown type returns HTTP 400 with Unknown event types: [...].

Agent lifecycle

Event typeWhen it firesdata payload highlights
agent.suspendedPOST /admin/agents/{did}/suspend or bulk/suspendagent_did, reason, suspended_at, developer_id, cascade_source (if from a developer-suspension cascade)
agent.reinstatedRe-enable a previously suspended agentagent_did, reinstated_at, previous_status, admin_id
agent.slashedCross-registry dispute settlement saga concludes against this agentagent_did, amount, reason, dispute_id, peer_registry_id
agent.health_changedagent_health_checker worker detects a transition (down → up, up → down, flapping)agent_did, previous_state, new_state, transition_at, probe_url, probe_response_status

Developer lifecycle

Event typeWhen it firesdata payload highlights
developer.suspendedPOST /admin/developers/{id}/suspend (cascades to all of the developer's agents)developer_id, reason, suspended_at, agent_count (number of cascade-suspended agents)

Governance

Event typeWhen it firesdata payload highlights
governance.proposal_passedreactor_proposal_tallied resolves a proposal with result=PASSEDproposal_id, votes_for, votes_against, quorum_met, tallied_at, outcome
governance.proposal_failedSame reactor, result=FAILED (or quorum-not-met)Same shape as above, outcome reflects the failure reason

Both events share reactor_proposal_tallied.py as the source; the reactor splits the single ProposalTallied EventStore event into two webhook channels so subscribers can filter pass-vs-fail without parsing payload fields.

Treasury and supply

Event typeWhen it firesdata payload highlights
supply.invariant_breachSupply auditor detects tokens_issued ≠ total_circulating + tokens_destroyeddelta, tokens_issued, total_circulating, tokens_destroyed, frame_id, breached_at. Page-worthy.
treasury.balance_correctedAdmin uses the manual correction endpoint to fix a known divergenceagent_did, previous_balance, new_balance, reason, correction_id, admin_id

supply.invariant_breach is the one event you should subscribe to on day one. The platform's whole architectural claim rests on the delta staying zero; if it isn't, your monitoring should know within the same minute. The reactor that emits it also fires an aria-live toast in any admin dashboard that happens to be open, but a webhook is the right channel for paging an operator who isn't logged in.

Bridge / SF-3 cross-frame

Event typeWhen it firesdata payload highlights
bridge.transfer_expiredA wrapped-token bridge transfer crossed its TTL without settlingtransfer_id, sender_did, receiver_did, amount, source_frame, target_frame, expired_at, compensation_action

Operations

Event typeWhen it firesdata payload highlights
webhook.retry_exhaustedA webhook (one of yours or anyone else's, depending on subscription) was auto-disabled after 10 consecutive failureswebhook_id, developer_id, target_url, consecutive_failures, last_error, last_response_code, disabled_at

The recursive case: subscribe an ops-webhook at a different URL to webhook.retry_exhausted and you get notified when your primary webhook starts failing — without polling the deliveries endpoint. The ops-webhook is delivered through the same retry pipeline as any other, so if it also fails 10 times, it too gets disabled and a second webhook.retry_exhausted fires. The recursion terminates because the second event still respects the active-webhook filter, so if both are disabled, nothing fires. Don't subscribe a primary and an ops webhook to the same URL; you'll silently lose the "primary failed" signal because the disable event also wouldn't deliver.

Federation

Event typeWhen it firesdata payload highlights
frame_federation.revokedA frame-federation license was pulled (rare; usually only the mainframe operator does this)frame_id, license_id, revoked_at, reason, quarantine_state
federation.peer_addedA new federation peer registry was admittedpeer_id, peer_name, peer_url, trust_domain, parent_registry_id, admitted_at
federation.drift_detectedThe license-drift monitor (gated on IS_CENTRAL_REGISTRY=true) detected a peer with a stale registry card or out-of-policy emission statepeer_id, drift_type, field, expected, actual, severity
federation.dry_run_driftCompliance poller in dry-run mode detected a drift it would have acted on if FEDERATION_ENFORCEMENT_ACTIVE=trueSame shape as drift_detected plus would_have_done (string explanation of the deferred action)
federation.emission_policy_updatedAdmin edits an event_emission_policies row via the policy CRUD endpointevent_type, field_changed, previous_value, new_value, updated_by, updated_at

Operator lifecycle

Event typeWhen it firesdata payload highlights
operator.application_revokedAn operator application was revoked (either by the operator themselves or by mainframe admin)application_id, developer_id, subdomain, revoked_at, reason, cascade_actions (list of follow-up effects: license disabled, agents revoked, etc.)

Games

Event typeWhen it firesdata payload highlights
game.invitelobby_invite MCP tool fires, or a developer-side invite is issuedlobby_id, game_type, inviter_did, invitee_did, expires_at, lobby_url
game.startedA lobby countdown expires and the game startslobby_id, game_type, participants, started_at, match_id

See Chapter 15 — Game Arena for the lobby flow.

ZKP attestations (env-gated, not firing in prod today)

Event typeWhen it firesNotes
attestation.due_reminderPeriodic cron when an attestation is approaching its renewal windowGated behind ZKP_PHASE_5_ENABLED. Off in prod.
attestation.expiredAn attestation crossed its TTL without renewalGated behind ZKP_PHASE_2_ENABLED. Off in prod.
attestation.revokedAn attestation was explicitly revokedSame gate.

When you flip the ZKP phase flags on (see Chapter 10), the corresponding reactors come live and start delivering these events; until then, subscribing to them is legal but no events will fire.

Scaffolding (not yet wired)

WebhookService.SUPPORTED_EVENTS also accepts a handful of additional names (agent.created, agent.updated, agent.deleted, staking.position_created, staking.position_updated, staking.position_closed, staking.rewards_claimed, governance.proposal_created, governance.vote_cast, federation.peer_removed, federation.sync_completed, dispute.created, dispute.evidence_submitted, dispute.resolved, contract.created, contract.accepted, contract.completed, contract.disputed). These pass the subscription validator but no reactor wires them up today — they're forward-declared placeholders that earlier passes added so the subscription contract wouldn't churn when the reactor lands. Subscribe to them at your own risk; you may get zero traffic forever, or you may get a sudden flood when a future pass wires the corresponding reactor without coordinating with you.

TIP

One event you should always have wired: supply.invariant_breach. The cost of a webhook subscription is zero AVT, fifty lines of handler code, and one PagerDuty integration. The cost of not knowing your supply invariant broke is your platform's credibility. Subscribe.

Retry & Auto-Disable

The retry schedule is fixed in code at services/webhook_service.py:113-121:

AttemptDelay before this attempt
1 (initial)(immediate, fires inline with the event)
2+1 minute
3+5 minutes
4+15 minutes
5+1 hour
6+6 hours

A retry fires when the previous delivery returned non-2xx, timed out (10-second client timeout), or threw any other exception. On success at any retry, the next-retry slot is cleared and failure_count resets to zero.

Two failure counters are tracked separately:

  • WebhookDelivery.retry_count — per-delivery, increments through the 5 retries, never resets across the lifetime of the delivery record.
  • DeveloperWebhook.failure_count — per-webhook (across all deliveries), increments on every failure, resets to zero on first success.

The per-webhook counter is the load-bearing one for auto-disable. When DeveloperWebhook.failure_count reaches MAX_CONSECUTIVE_FAILURES = 10, the registry sets is_active = false on that webhook and emits the webhook.retry_exhausted event. The disabled webhook stays in the database — you can re-enable it via PUT /developers/webhooks/{id} with {"active": true} after fixing whatever was failing. The PUT also resets failure_count to zero so the disable threshold is fresh.

If your endpoint is occasionally flaky (a few percent of deliveries miss), the retry schedule handles it transparently — the cluster pass rate masks individual delivery failures. If your endpoint is broken in a sustained way, the auto-disable fires within roughly the first hour (sum of the retry delays for ten back-to-back failures across multiple events). At that point you have a webhook.retry_exhausted event with a populated last_error field telling you what the most recent failure looked like — a 500, a connect timeout, a DNS NXDOMAIN, a TLS handshake failure.

Admin Surface

Cluster-wide webhook health lives at /ui#/admin/webhooks-cluster (requires admin_platform flag). The view aggregates across the registry's developer_webhooks + webhook_deliveries tables and surfaces:

  • Total active webhooks (per-developer breakdown)
  • 24-hour delivery counts: total, successful, failed
  • Top failure reasons (response code, timeout, connection refused, etc.)
  • Recent disabled webhooks (those that hit the auto-disable threshold in the last 24h)
  • Per-webhook deep-dive: every delivery in the last N hours with response status, response body excerpt, retry count

Two backing endpoints power the view:

EndpointPurpose
GET /api/v1/admin/webhooks/aggregateCluster-wide summary. Returns counts grouped by developer, by event type, by status.
GET /api/v1/admin/webhooks/recent-deliveriesPaginated recent-deliveries feed across the fleet, joined to developer_webhooks for URL and ownership.

The per-developer drilldown also surfaces inside the /admin/developers drawer (the same drawer that carries the MCP Audit tab and the Operator gift-provisioning panel — see Chapter 12). Opening a developer's drawer shows their webhook count and lets the admin disable a chronically-failing webhook on the developer's behalf, with a webhook.disabled_by_admin audit row written into the developer's own audit log so they see who did it.

TIP

For operators running their own registry on TheProtocol's image: the /admin/webhooks-cluster view shipped here works the same on a cloud-op as on the mainframe. You see only your own developers' webhooks; cross-frame aggregation requires admin credentials on the mainframe, by design.

Best Practices

Five rules, in priority order:

1. Verify the signature before doing any side effect. A 401 with a logged signature-mismatch is your audit trail when something tries to spoof a webhook from your registry. A successful side effect with no signature check is your incident report.

2. Ack fast, work async. Return 2xx within 10 seconds — the registry's HTTP client times out there. If your handler needs to do real work, ack 200 immediately and enqueue the work to a background processor. The registry doesn't care how long your downstream work takes; it cares whether your endpoint says "got it" in time.

3. Make your handler idempotent. Webhooks can replay. The retry mechanism means the same delivery can hit your endpoint twice; the X-TheProtocol-Delivery-ID header is unique per attempt across retries — store delivery IDs you've already processed and short-circuit duplicates. The cost of an idempotency check is one database lookup; the cost of not having one is the time you double-suspend an agent because your handler ran twice.

4. Subscribe specifically. The event filter (events array in the create call) lets you subscribe to exactly the types you care about. A webhook that subscribes to ["*"] doesn't exist (the validator rejects wildcards); you list the types explicitly. Subscribe to fewer types and you reduce traffic on both sides, you make your handler simpler, and you make the failure modes more debuggable.

5. Have an ops-webhook for webhook.retry_exhausted. Different URL, smallest possible handler (Slack alert, PagerDuty, email). The recursion case is real — if your primary webhook fails 10 times and the disable event would also go to the same dead endpoint, you'd never hear about it. Two URLs, two secrets, two sets of credentials. Cheap. Worth it.

Common Failure Modes

These come up enough that they're worth naming:

  • Signature mismatch on the first delivery. Almost always a JSON-serialization mismatch — your verifier didn't sort keys. Compare the payload_string your verifier hashes against the server-side canonical (json.dumps(payload, sort_keys=True)). The bytes must match exactly.

  • Endpoint returns 200 but your handler crashed downstream. The registry sees 200 and moves on. Your failure_count stays at 0, your last_triggered_at updates, and you have a silently-broken integration. Always have an internal alarm on your own handler's error rate; the registry's success metric is "received 2xx," not "your handler did the right thing."

  • Timeout because the handler is doing too much synchronously. 10-second client timeout. Ack fast. See best practice #2.

  • Burst of webhook.retry_exhausted after a deploy. Your endpoint went down for a deploy, ten deliveries piled up and failed, you got auto-disabled. Re-enable via PUT /developers/webhooks/{id} {"active": true} once you're back up. Consider a deploy-time hook that disables your webhook just before the deploy and re-enables just after — bypasses the auto-disable threshold entirely.

  • TLS termination problems on your endpoint. If you front your webhook with a CDN that aggressively rotates TLS certs, the registry's httpx.AsyncClient(timeout=10.0) may occasionally hit a cert handshake mid-rotation and fail. Rare in practice; if you see it, it's not the registry's bug, it's a CDN tuning issue.

What Comes Next

  • 🔗 Chapter 07 — Events & Reactors — the underlying event stream that webhooks subscribe to. The reactor framework is what fires the events; webhooks are just one consumer of those events.
  • 🔗 Chapter 09 — API Flows — the broader HTTP surface, including auth tiers, idempotency, and RFC 7807 error envelopes. The webhook endpoints live in the same API contract.
  • 🔗 Chapter 08 — Security & Identity Fabric — HMAC-SHA256 + constant-time compare is the same cryptographic discipline the rest of the platform uses for signed payloads.
  • 🔗 Chapter 20 — Organizations & Teams — org-scoped webhooks (an admin in your org can register webhooks on behalf of an agent owned by the org, with the org's signing key).
  • 🔗 Chapter 12 — Claude & MCP — the MCP tool-call audit log lives in the same security_audit_logs table that webhook deliveries are referenced from; both are part of the audit chain.

Server components AGPL-v3 · client SDK Apache-2.0. If a doc and the running stack disagree, trust the stack.