$worker

llm-router

v0.2.1

One front door + provider protocol in front of every LLM provider.

macOS: arm64 · x64
Linux: arm64 · armv7 · x64
Windows: arm64 · x64 · x86

install

$iii worker add llm-router@0.2.1

readme

open as markdown

README.md

llm-router

One front door for every LLM provider. The router owns routing, the provider registry, credential resolution, the model catalog, streaming relay, retries, and a single failure contract — consumers call one chat surface and never talk to a provider directly.

llm-router is a standalone iii worker. Providers plug in as separate workers at runtime through a self-registration protocol (iii worker add provider-); the router never compiles against a provider, and removing a provider worker removes the provider.

Install

iii worker add llm-router

Quickstart

A consumer streams a turn by creating an iii channel, handing the router the channel's write endpoint, and reading frames from the read endpoint while router::chat runs. Any SDK works; Node shown:

import { createChannel } from 'iii-sdk';

const { reader, writerRef } = await createChannel(iii);
reader.onMessage((frame) => {
  const event = JSON.parse(frame); // AssistantMessageEvent
  if (event.type === 'text_delta') process.stdout.write(event.delta);
});

const res = await iii.trigger('router::chat', {
  writer_ref: writerRef, // direction "write"
  model: 'claude-sonnet-4',
  messages: [{ role: 'user', content: [{ type: 'text', text: 'Hello' }], timestamp: Date.now() }],
}, { timeout_ms: 320_000 }); // outer timeout ≥ the router's 300s stream budget
// res: { ok, provider, model, stop_reason, usage }

The streaming contract: every stream ends with exactly one terminal frame (done or error). When the router has to kill a stream itself (idle timeout, provider crash), it synthesizes the terminal frame and attaches the partial content, so consumers never hang on a half-open stream.

Functions

Consumer surface

Function	Purpose
`router::chat`	Stream a turn into the caller's channel; returns the turn summary.
`router::complete`	Non-streaming convenience over the same pipeline; returns the final message.
`router::abort`	Cancel an in-flight turn by `request_id`.
`router::route`	Read-only routing preview: `{model, provider?}` → `{provider, candidates}`, same rules and error codes as `router::chat`. Pin the result as the explicit `provider` on the chat call when you need the provider before streaming.
`router::models::list`	List catalog models, filterable by `provider` / `capability`.
`router::models::get`	Fetch one model record (`null` when unknown).
`router::models::supports`	Check one capability flag for one model.
`router::provider::list`	Registered providers with `configured` / `available` status.

Agent exposure is restricted per iii-permissions.yaml to the read surface (router::models::*, router::provider::list).

Provider protocol

Token-gated after the first declare: the response to register carries a registration token, and every later protocol call must present it.

Function	Purpose
`router::provider::register`	Self-declaration at attach time; idempotent re-declare with the token.
`router::provider::resolve`	Per-request credential + endpoint resolution (config > env > none).
`router::provider::update_credential`	Persist a refreshed credential (OAuth write-back).
`router::models::reconcile`	Replace the provider's catalog slice in one write.

The provider worker itself exposes provider::::stream and, when it supports model discovery, provider::::refresh_models.

Configuration

All operator configuration lives in the engine's llm-router configuration entry — no env vars, no config file. The entry schema is composed at runtime from each registered provider's declaration:

{
  "default_provider": "anthropic",
  "providers": {
    "anthropic": { "api_key": "sk-…", "api_url": "https://api.anthropic.com/v1/messages", "max_tokens": 8192 }
  },
  "routing_heuristics": [{ "pattern": "^gpt-", "provider": "openai" }],
  "settings": {
    "stream_timeout_ms": 300000,
    "idle_timeout_ms": 120000,
    "retry_max": 2,
    "output_token_max": 32000
  }
}

Setting	Default	Meaning
`stream_timeout_ms`	`300000`	Hard budget for one streamed turn.
`idle_timeout_ms`	`120000`	Max silence between provider frames before the attempt is cut.
`retry_max`	`2`	Retries per turn for retryable failures before the first forwarded frame.
`output_token_max`	`32000`	Ceiling on `max_output_tokens` forwarded to providers.

Pasting a key into a provider's slice is the whole onboarding flow: the router diffs the changed slice, debounces ~2 s, and kicks that provider's provider::::refresh_models discovery; discovered models land in the catalog via router::models::reconcile and show up in router::models::list within seconds — no restart.

Operational notes

Env-var credential fallback resolves in the router's process. A provider's credential_env_var (e.g. ANTHROPIC_API_KEY) is read by the llm-router binary, not by the provider worker — launch the router with those variables set, or put keys in the entry. A key present only in another worker's environment shows up as configured: false.
Registration-token recovery. Re-registering a provider id without its original token is rejected (anti-takeover). If a provider durably lost its token, delete the router's registry state (iii-state scope llm-router, key registry) and restart the affected providers to re-bind; pasted credentials in the configuration entry are unaffected.

Events

The router publishes three events over the engine's iii-pubsub worker. Bind an iii function to a topic with the engine's subscribe trigger type; the handler receives the payload verbatim (no envelope).

Topic	Fires when	Payload
`router::models::changed`	a provider reconciles its catalog slice	`{ "provider": "", "count": }`
`router::provider::changed`	the registry changes (declare / availability flip)	`{ "provider": "", "op": "register" \| "available" \| "unavailable" }`
`router::ready`	the router finishes booting; providers re-declare on it	`{}`

iii.registerFunction({ id: 'my-worker::onModelsChanged' }, async (payload) => {
  console.log('catalog changed:', payload); // { provider, count }
  return {};
});

iii.registerTrigger({
  type: 'subscribe',
  function_id: 'my-worker::onModelsChanged',
  config: { topic: 'router::models::changed' },
});

Writing a provider worker

A provider worker must:

Register provider::::stream honouring the channel-writer contract: forward upstream output as AssistantMessageEvent frames into the writer_ref it receives, ending with one terminal frame.
Declare itself at startup via router::provider::register — retrying with backoff until acknowledged (covers provider-before-router boot order) — and re-declare on the router::ready event after a router restart.
Resolve credentials per request via router::provider::resolve; never read keys directly.
Treat closure of its stream channel as cancellation: abort the upstream request and stop writing frames.
Map upstream failures to the shared ErrorKind taxonomy on its error frames. Transport retries (429 / 5xx / connect) are the router's job, not the provider's.

The first real provider implementing this protocol is provider-anthropic/ — useful as a reference implementation alongside the scripted provider in the integration tests. provider-openai/ follows the same structure for the OpenAI Chat Completions API (native structured output, reasoning_effort).

Local development & testing

cargo test                       # unit suite, no engine needed
cargo test --test integration    # engine-backed suite; self-skips without an engine

The integration suite spawns a throwaway engine per test when iii is on PATH (or III_ENGINE_BIN points at a binary) and covers the chat relay, cancellation, abort, restart recovery, registration token gating, paste-a-key discovery, and event delivery end to end.

To run the worker locally against an engine:

cargo run -- --url ws://127.0.0.1:49134

--url defaults to ws://127.0.0.1:49134 and honours the III_WS_URL environment variable when the flag is not set. --config is accepted per the standard worker CLI but ignored with a warning — operator config lives in the engine's llm-router configuration entry (see Configuration above).