Skip to content

Observability Reference

The Gambi hub emits an operational baseline for inference activity. This page documents the public contract.

Observability is consumed through the management SSE stream:

GET /v1/rooms/:code/events

Or through the SDK:

for await (const event of client.events.watchRoom({ roomCode })) {
// ...
}

Three inference-related event types are emitted on every request that reaches routing.

EventWhen it fires
llm.requestrouting has selected a participant and the hub is about to send the tunnel request
llm.completethe participant returned a final response or the stream ended cleanly
llm.errorthe request failed in any stage — routing, tunnel transport, or provider

Management-level events (participant.joined, participant.updated, participant.left, participant.offline, room.created) are documented in the API Reference and SDK Reference.

FieldTypeDescription
requestIdstringcorrelation identifier shared across llm.request, llm.complete, and llm.error
participantIdstringparticipant selected by routing
modelstringmodel name as seen by the hub
protocol"openResponses" | "chatCompletions"surface the request used against the hub
FieldTypeDescription
requestIdstringsame as in llm.request
participantIdstringparticipant that produced the response
modelstringmodel name
protocol"openResponses" | "chatCompletions"surface of the request
metricsMetricssee below
FieldTypeDescription
requestIdstringcorrelation identifier
participantIdstring | nullparticipant, when one was selected
nicknamestring | nullparticipant nickname, when known
endpointstring | nullparticipant-local provider endpoint, when known
modelstring | nullmodel name, when known
protocol"openResponses" | "chatCompletions"surface of the request
stagestringwhere the failure happened (routing, tunnel, provider, etc.)
errorstringhuman-readable failure message

llm.complete.metrics carries six fields:

FieldUnitSourceNotes
ttftMsmillisecondshub-observedtime to first token (streaming) or first byte (non-streaming)
durationMsmillisecondshub-observedtotal request time
inputTokenstokensprovider usagemay be absent when the upstream provider does not expose token counts
outputTokenstokensprovider usagemay be absent when streaming without usage reporting
totalTokenstokensprovider usage or derivedfalls back to inputTokens + outputTokens when available
tokensPerSecondtokens/secondderivedoutputTokens / durationMs, only present when outputTokens is known
  • ttftMs and durationMs are always present for successful requests, because the hub observes them directly.
  • Token counts depend on the upstream provider. Streaming endpoints that do not include a usage object will leave them unset.
  • Metrics are hub-observed. They do not include latency experienced on the client side of the HTTP request, and they do not replace end-to-end distributed tracing.

Every management payload that includes a participant exposes a connection block:

FieldTypeDescription
kind"tunnel"transport in use
connectedbooleanwhether the tunnel is currently open
lastTunnelSeenAtstring | nullISO timestamp of the most recent tunnel activity

This appears in:

  • PUT /v1/rooms/:code/participants/:id responses
  • GET /v1/rooms/:code/participants list entries
  • participant.joined / participant.updated SSE payloads
  • ParticipantSummary returned by the SDK

Combine connection.connected with the participant’s status field to distinguish “registered but offline” from “live and ready to handle a request”.

The hub also emits structured console logs parallel to the SSE events:

  • [gambi] llm.request
  • [gambi] llm.complete
  • [gambi] llm.error

These are intended for the operator running the hub; the SSE stream is the canonical surface for programmatic consumers.

This baseline is intentionally narrow. The following are not provided by the hub today:

  • persistent storage or replay of past events
  • aggregated dashboards (p50/p95 latency, error rate over time)
  • sampling or export pipelines (OpenTelemetry, Prometheus)
  • end-to-end tracing across client, hub, and participant

You can build any of these on top of the SSE stream — the event contract is stable enough for that. Treat this reference as the floor, not the ceiling.