LLM Routing

Model Routing API

One endpoint, every frontier LLM. Route requests across Claude Fable 5, Claude 4.8, Claude 4.7, Claude 4.6, Claude 4.5, GPT, Deepseek, and open-weight models with automatic fallback, cost controls, and unified billing.

View all supported models API Dashboard

Overview

One key, every model

Most teams start with one model and realize within weeks that no single provider wins every situation. GPT-4o is fast. Claude Sonnet reasons well. Deepseek V3 is cheap. Qwen handles Chinese better. Switching between them means managing multiple SDKs, keys, billing accounts, and rate-limit strategies.

One key, every model

ImaRouter collapses all of that into one API surface. You send a standard chat completion request with your preferred model or let the router choose automatically. We handle provider auth, rate limits, retries, and logging behind the single endpoint.

One API key for all providers — OpenAI, Anthropic, Deepseek, Qwen, and more
Automatic fallback when a provider is rate-limited or returns an error
Latency-aware routing: pick the fastest available model within your budget
Cost caps per route or per project
Unified request and token logging across all providers

How routing works

When a request arrives, ImaRouter evaluates your routing policy: preferred model, cost ceiling, latency target, and fallback chain. If the primary model is unavailable or exceeds your threshold, the next model in the chain is tried automatically — with no change to your application code.

You can define routing policies per endpoint, per user segment, or at the account level. A single integration handles a chatbot that needs GPT-4o for paying users and Deepseek V3 for free-tier users, without branching your codebase.

Priority-ordered fallback chains you define
Policy modes: cheapest, fastest, or quality-first
Per-route overrides via request headers or policy API
Streaming support across all providers with unified SSE format

Capabilities

Supported models

ImaRouter gives you access to the full frontier model catalog through one integration. New models are added as they launch — no SDK update required on your end.

Supported models

ImaRouter gives you access to the full frontier model catalog through one integration. New models are added as they launch — no SDK update required on your end.

Claude Fable 5, Claude 4.8, Claude 4.7, Claude 4.6, Claude 4.5 (Anthropic)
GPT-4o, GPT-4o mini, o3, o4-mini (OpenAI)
Gemini 2.0 Flash, Gemini 1.5 Pro (Google)
Deepseek V3, Deepseek R1
Qwen 2.5, Qwen 2.5 Coder
Llama 3.3, Mistral Large, and other open-weight models

Common use cases

Model routing unlocks specific product patterns that are hard to build reliably with a single-provider integration.

Multi-tier products: use a cheap model for free users and a premium model for paid users via one routing rule
Agent pipelines: use a fast model for tool calls and a reasoning model for planning steps
Resilient chatbots: automatic fallback means a provider outage doesn't take your product down
Cost management: hard caps per project prevent bill surprises during traffic spikes
A/B testing: split traffic across two models to compare output quality or cost

Getting started

ImaRouter is OpenAI SDK-compatible. Point your existing client at our base URL and add your ImaRouter API key. No library changes required.

Set model to the specific provider model ID you want, or use 'auto' to let ImaRouter route to the best available option based on your account policy. Your first 1,000 routed requests are free.

Base URL: https://api.imarouter.com/v1
Compatible with OpenAI Python SDK, JS/TS SDK, and any HTTP client
Use model: 'auto' for policy-driven routing, or specify any supported model ID
Dashboard at api.imarouter.com for usage, logs, and policy configuration

FAQ

Can I define my own fallback order?

Yes. You can specify a priority-ordered list of models in your routing policy. ImaRouter tries each in order when the preferred model is unavailable, rate-limited, or exceeds your latency or cost threshold.

Does routing add latency to my requests?

ImaRouter adds under 50ms to request routing. For most LLM use cases where model response time is 500ms–5s, this overhead is negligible. Routing decisions are made in memory using pre-evaluated policy rules.

Is the output format the same across all models?

Yes. All model responses are normalized to the OpenAI chat completion schema regardless of which provider handled the request. Your parsing and display code doesn't need to branch per provider.

Can I see which model handled each request?

Every response includes an x-routed-model header and the model field in the response body reflects the actual provider model used. The dashboard logs each request with full routing trace.

What happens if all my fallback models are unavailable?

ImaRouter returns a structured error response with the reason (all fallbacks exhausted, cost cap reached, etc.). You can configure a maximum retry window and a final error behavior — fail fast or queue for retry.

Launch paths

Model Routing API

One key, every model

One key, every model

How routing works

Supported models

Supported models

Common use cases

Getting started

FAQ

Related links and launch paths