$worker

llm-router

v0.2.1

One front door + provider protocol in front of every LLM provider.

  • macOS: arm64 · x64
  • Linux: arm64 · armv7 · x64
  • Windows: arm64 · x64 · x86

install

install
$iii worker add llm-router@0.2.1
README.md

llm-router

One front door for every LLM provider. The router owns routing, the provider registry, credential resolution, the model catalog, streaming relay, retries, and a single failure contract — consumers call one chat surface and never talk to a provider directly.

llm-router is a standalone iii worker. Providers plug in as separate workers at runtime through a self-registration protocol (iii worker add provider-); the router never compiles against a provider, and removing a provider worker removes the provider.

Install

iii worker add llm-router

Quickstart

A consumer streams a turn by creating an iii channel, handing the router the channel's write endpoint, and reading frames from the read endpoint while router::chat runs. Any SDK works; Node shown:

import { createChannel } from 'iii-sdk';

const { reader, writerRef } = await createChannel(iii);
reader.onMessage((frame) => {
  const event = JSON.parse(frame); // AssistantMessageEvent
  if (event.type === 'text_delta') process.stdout.write(event.delta);
});

const res = await iii.trigger('router::chat', {
  writer_ref: writerRef, // direction "write"
  model: 'claude-sonnet-4',
  messages: [{ role: 'user', content: [{ type: 'text', text: 'Hello' }], timestamp: Date.now() }],
}, { timeout_ms: 320_000 }); // outer timeout ≥ the router's 300s stream budget
// res: { ok, provider, model, stop_reason, usage }

The streaming contract: every stream ends with exactly one terminal frame (done or error). When the router has to kill a stream itself (idle timeout, provider crash), it synthesizes the terminal frame and attaches the partial content, so consumers never hang on a half-open stream.

Functions

Consumer surface

Function Purpose
router::chat Stream a turn into the caller's channel; returns the turn summary.
router::complete Non-streaming convenience over the same pipeline; returns the final message.
router::abort Cancel an in-flight turn by request_id.
router::route Read-only routing preview: {model, provider?}{provider, candidates}, same rules and error codes as router::chat. Pin the result as the explicit provider on the chat call when you need the provider before streaming.
router::models::list List catalog models, filterable by provider / capability.
router::models::get Fetch one model record (null when unknown).
router::models::supports Check one capability flag for one model.
router::provider::list Registered providers with configured / available status.

Agent exposure is restricted per iii-permissions.yaml to the read surface (router::models::*, router::provider::list).

Provider protocol

Token-gated after the first declare: the response to register carries a registration token, and every later protocol call must present it.

Function Purpose
router::provider::register Self-declaration at attach time; idempotent re-declare with the token.
router::provider::resolve Per-request credential + endpoint resolution (config > env > none).
router::provider::update_credential Persist a refreshed credential (OAuth write-back).
router::models::reconcile Replace the provider's catalog slice in one write.

The provider worker itself exposes provider::::stream and, when it supports model discovery, provider::::refresh_models.

Configuration

All operator configuration lives in the engine's llm-router configuration entry — no env vars, no config file. The entry schema is composed at runtime from each registered provider's declaration:

{
  "default_provider": "anthropic",
  "providers": {
    "anthropic": { "api_key": "sk-…", "api_url": "https://api.anthropic.com/v1/messages", "max_tokens": 8192 }
  },
  "routing_heuristics": [{ "pattern": "^gpt-", "provider": "openai" }],
  "settings": {
    "stream_timeout_ms": 300000,
    "idle_timeout_ms": 120000,
    "retry_max": 2,
    "output_token_max": 32000
  }
}
Setting Default Meaning
stream_timeout_ms 300000 Hard budget for one streamed turn.
idle_timeout_ms 120000 Max silence between provider frames before the attempt is cut.
retry_max 2 Retries per turn for retryable failures before the first forwarded frame.
output_token_max 32000 Ceiling on max_output_tokens forwarded to providers.

Pasting a key into a provider's slice is the whole onboarding flow: the router diffs the changed slice, debounces ~2 s, and kicks that provider's provider::::refresh_models discovery; discovered models land in the catalog via router::models::reconcile and show up in router::models::list within seconds — no restart.

Operational notes

  • Env-var credential fallback resolves in the router's process. A provider's credential_env_var (e.g. ANTHROPIC_API_KEY) is read by the llm-router binary, not by the provider worker — launch the router with those variables set, or put keys in the entry. A key present only in another worker's environment shows up as configured: false.
  • Registration-token recovery. Re-registering a provider id without its original token is rejected (anti-takeover). If a provider durably lost its token, delete the router's registry state (iii-state scope llm-router, key registry) and restart the affected providers to re-bind; pasted credentials in the configuration entry are unaffected.

Events

The router publishes three events over the engine's iii-pubsub worker. Bind an iii function to a topic with the engine's subscribe trigger type; the handler receives the payload verbatim (no envelope).

Topic Fires when Payload
router::models::changed a provider reconciles its catalog slice { "provider": "", "count": }
router::provider::changed the registry changes (declare / availability flip) { "provider": "", "op": "register" | "available" | "unavailable" }
router::ready the router finishes booting; providers re-declare on it {}
iii.registerFunction({ id: 'my-worker::onModelsChanged' }, async (payload) => {
  console.log('catalog changed:', payload); // { provider, count }
  return {};
});

iii.registerTrigger({
  type: 'subscribe',
  function_id: 'my-worker::onModelsChanged',
  config: { topic: 'router::models::changed' },
});

Writing a provider worker

A provider worker must:

  1. Register provider::::stream honouring the channel-writer contract: forward upstream output as AssistantMessageEvent frames into the writer_ref it receives, ending with one terminal frame.
  2. Declare itself at startup via router::provider::register — retrying with backoff until acknowledged (covers provider-before-router boot order) — and re-declare on the router::ready event after a router restart.
  3. Resolve credentials per request via router::provider::resolve; never read keys directly.
  4. Treat closure of its stream channel as cancellation: abort the upstream request and stop writing frames.
  5. Map upstream failures to the shared ErrorKind taxonomy on its error frames. Transport retries (429 / 5xx / connect) are the router's job, not the provider's.

The first real provider implementing this protocol is provider-anthropic/ — useful as a reference implementation alongside the scripted provider in the integration tests. provider-openai/ follows the same structure for the OpenAI Chat Completions API (native structured output, reasoning_effort).

Local development & testing

cargo test                       # unit suite, no engine needed
cargo test --test integration    # engine-backed suite; self-skips without an engine

The integration suite spawns a throwaway engine per test when iii is on PATH (or III_ENGINE_BIN points at a binary) and covers the chat relay, cancellation, abort, restart recovery, registration token gating, paste-a-key discovery, and event delivery end to end.

To run the worker locally against an engine:

cargo run -- --url ws://127.0.0.1:49134

--url defaults to ws://127.0.0.1:49134 and honours the III_WS_URL environment variable when the flag is not set. --config is accepted per the standard worker CLI but ignored with a warning — operator config lives in the engine's llm-router configuration entry (see Configuration above).