██████╗  █████╗ ███╗   ███╗██████╗ ██╗
██╔════╝ ██╔══██╗████╗ ████║██╔══██╗██║
██║  ███╗███████║██╔████╔██║██████╔╝██║
██║   ██║██╔══██║██║╚██╔╝██║██╔══██╗██║
╚██████╔╝██║  ██║██║ ╚═╝ ██║██████╔╝██║
 ╚═════╝ ╚═╝  ╚═╝╚═╝     ╚═╝╚═════╝ ╚═╝

A shared room for every LLM_ you build with.

Build apps where many models collaborate across people, machines and providers.

Gambi ships the gateway. You ship the experience.

Get started↗ See patterns GitHub

02 / patterns

What you can build

Six recipes that fit on a few lines of TypeScript. The substrate (routing, observability, multi-source) is already there.

Model arena

Same prompt, every model in the room. Spot differences in tone, accuracy, and speed at a glance.

→ pattern: Round-robin via model:"*", N requests in parallel, render side by side.

see pattern →

Jury / judge panel

Generate the answer once. Fan it out to other models for verdicts and scoring. The g-eval pattern, off the shelf.

→ pattern: 1 generator participant + N judge participants. Aggregate the votes.

see pattern →

Draft → critique → polish

Cheap-then-strong. Small model drafts, bigger one critiques, third polishes. Quality without paying frontier prices end to end.

→ pattern: 3 chained calls — output of step N feeds prompt N+1.

see pattern →

Debate club

Two models argue, a third moderates. Stream every turn through SSE for a live show.

→ pattern: Loop between participant IDs with conflicting system prompts.

see pattern →

Multi-persona NPCs

A game where each character has its own brain. Different model, different system prompt, different attitude.

→ pattern: One participant per persona. Route by ID at the moment of speech.

see pattern →

LAN debate club

Bring friends or students. Each plugs their own LLM into the room. You build the UI; the models stay theirs.

→ pattern: Multi-person registration. App reads the participant list and fans out.

see pattern →

03 / architecture

How it sits in your stack

Your app talks to a single OpenAI-compatible URL. The hub routes inside the room. Every participant runs its own provider, in its own tunnel.

01 · client

your app

SDK / OpenAI client / curl
tool that speaks OpenAI
your TUI, your dashboard

02 · hub

gambi hub

room registry
routing: * · model:<name> · <id>
SSE events: llm.request · llm.complete

03 · participants

your providers

ollama @ localhost:11434
vllm @ 192.168.1.50:8000
openrouter (cloud)
openai (rate limited)

Provider endpoints stay on localhost. The hub never reaches in — the participant runtime opens the tunnel.

04 / install

Get the room running in 60 seconds

One binary on the hub machine, one SDK call from your app. No accounts, no cloud bill, no signup.

curl recommended

npm

bun

i. gambi hub serve
ii. gambi room create --name "demo"
iii. gambi participant join --room ABC123 --model llama3

app.ts · gambi-sdk

import { createGambi } from "gambi-sdk";
import { generateText } from "ai";

const gambi = createGambi({
  roomCode: "ABC123",
  hubUrl: "http://localhost:3000",
});

// route to any participant
const { text } = await generateText({
  model: gambi.any(),
  prompt: "Explain TLS in one sentence.",
});

// or to a specific model
const judge = await generateText({
  model: gambi.model("claude-haiku-4-5"),
  prompt: `Score this answer 1-10: ${text}`,
});

05 / specs

Built for builders

Going multi-model normally means re-wiring auth, routing and observability for every new provider. Gambi already did the boring parts — local-first, no signup, no cloud bill.

Bring any provider, mix freely

Local Ollama, vLLM, OpenRouter, OpenAI, Groq — all in the same room. Credentials and endpoints stay on the participant machine; the hub never sees them.

byo · tunnel · multi-source

Real-time observability

Every inference emits an SSE event with TTFT, total duration, and token counts. Pipe it to a TUI, a dashboard, or your own tracer.

llm.request · llm.complete · llm.error

OpenAI-compatible everywhere

One endpoint speaks Responses API and Chat Completions. AI SDK, OpenAI SDK, curl, any tool with a custom base URL — they all just work.

/responses · /chat/completions

Many rooms, one hub

Run as many rooms as you want on a single hub — one per app, team, or experiment. Each room has its own participants, routing, and event stream.

scoped routing · isolated state