Skip to content

A shared room for every LLM_ you build with.

Build apps where many models collaborate across people, machines and providers.

Gambi ships the gateway. You ship the experience.

02 / patterns

What you can build

Six recipes that fit on a few lines of TypeScript. The substrate (routing, observability, multi-source) is already there.

03 / architecture

How it sits in your stack

Your app talks to a single OpenAI-compatible URL. The hub routes inside the room. Every participant runs its own provider, in its own tunnel.

01 · client
your app
  • SDK / OpenAI client / curl
  • tool that speaks OpenAI
  • your TUI, your dashboard
02 · hub
gambi hub
  • room registry
  • routing: * · model:<name> · <id>
  • SSE events: llm.request · llm.complete
03 · participants
your providers
  • ollama @ localhost:11434
  • vllm @ 192.168.1.50:8000
  • openrouter (cloud)
  • openai (rate limited)

Provider endpoints stay on localhost. The hub never reaches in — the participant runtime opens the tunnel.

04 / install

Get the room running in 60 seconds

One binary on the hub machine, one SDK call from your app. No accounts, no cloud bill, no signup.

curl recommended
npm
bun
  1. i. gambi hub serve
  2. ii. gambi room create --name "demo"
  3. iii. gambi participant join --room ABC123 --model llama3
app.ts · gambi-sdk
import { createGambi } from "gambi-sdk";
import { generateText } from "ai";

const gambi = createGambi({
  roomCode: "ABC123",
  hubUrl: "http://localhost:3000",
});

// route to any participant
const { text } = await generateText({
  model: gambi.any(),
  prompt: "Explain TLS in one sentence.",
});

// or to a specific model
const judge = await generateText({
  model: gambi.model("claude-haiku-4-5"),
  prompt: `Score this answer 1-10: ${text}`,
});
05 / specs

Built for builders

Going multi-model normally means re-wiring auth, routing and observability for every new provider. Gambi already did the boring parts — local-first, no signup, no cloud bill.

01

Bring any provider, mix freely

Local Ollama, vLLM, OpenRouter, OpenAI, Groq — all in the same room. Credentials and endpoints stay on the participant machine; the hub never sees them.

byo · tunnel · multi-source
02

Real-time observability

Every inference emits an SSE event with TTFT, total duration, and token counts. Pipe it to a TUI, a dashboard, or your own tracer.

llm.request · llm.complete · llm.error
03

OpenAI-compatible everywhere

One endpoint speaks Responses API and Chat Completions. AI SDK, OpenAI SDK, curl, any tool with a custom base URL — they all just work.

/responses · /chat/completions
04

Many rooms, one hub

Run as many rooms as you want on a single hub — one per app, team, or experiment. Each room has its own participants, routing, and event stream.

scoped routing · isolated state