LLM Routing

Model Routing API

One endpoint, every frontier LLM. Route requests across Claude, GPT, Deepseek, and open-weight models with automatic fallback, cost controls, and unified billing.

Overview

One key, every model

Most teams start with one model and realize within weeks that no single provider wins every situation. GPT-4o is fast. Claude Sonnet reasons well. Deepseek V3 is cheap. Qwen handles Chinese better. Switching between them means managing multiple SDKs, keys, billing accounts, and rate-limit strategies.

One key, every model

Most teams start with one model and realize within weeks that no single provider wins every situation. GPT-4o is fast. Claude Sonnet reasons well. Deepseek V3 is cheap. Qwen handles Chinese better. Switching between them means managing multiple SDKs, keys, billing accounts, and rate-limit strategies.

ImaRouter collapses all of that into one API surface. You send a standard chat completion request with your preferred model or let the router choose automatically. We handle provider auth, rate limits, retries, and logging behind the single endpoint.

  • One API key for all providers — OpenAI, Anthropic, Deepseek, Qwen, and more
  • Automatic fallback when a provider is rate-limited or returns an error
  • Latency-aware routing: pick the fastest available model within your budget
  • Cost caps per route or per project
  • Unified request and token logging across all providers

How routing works

When a request arrives, ImaRouter evaluates your routing policy: preferred model, cost ceiling, latency target, and fallback chain. If the primary model is unavailable or exceeds your threshold, the next model in the chain is tried automatically — with no change to your application code.

You can define routing policies per endpoint, per user segment, or at the account level. A single integration handles a chatbot that needs GPT-4o for paying users and Deepseek V3 for free-tier users, without branching your codebase.

  • Priority-ordered fallback chains you define
  • Policy modes: cheapest, fastest, or quality-first
  • Per-route overrides via request headers or policy API
  • Streaming support across all providers with unified SSE format

Capabilities

Supported models

ImaRouter gives you access to the full frontier model catalog through one integration. New models are added as they launch — no SDK update required on your end.

03

Supported models

ImaRouter gives you access to the full frontier model catalog through one integration. New models are added as they launch — no SDK update required on your end.

  • Claude Sonnet 4.6, Claude Haiku 4.5 (Anthropic)
  • GPT-4o, GPT-4o mini, o3, o4-mini (OpenAI)
  • Gemini 2.0 Flash, Gemini 1.5 Pro (Google)
  • Deepseek V3, Deepseek R1
  • Qwen 2.5, Qwen 2.5 Coder
  • Llama 3.3, Mistral Large, and other open-weight models
04

Common use cases

Model routing unlocks specific product patterns that are hard to build reliably with a single-provider integration.

  • Multi-tier products: use a cheap model for free users and a premium model for paid users via one routing rule
  • Agent pipelines: use a fast model for tool calls and a reasoning model for planning steps
  • Resilient chatbots: automatic fallback means a provider outage doesn't take your product down
  • Cost management: hard caps per project prevent bill surprises during traffic spikes
  • A/B testing: split traffic across two models to compare output quality or cost
05

Getting started

ImaRouter is OpenAI SDK-compatible. Point your existing client at our base URL and add your ImaRouter API key. No library changes required.

Set model to the specific provider model ID you want, or use 'auto' to let ImaRouter route to the best available option based on your account policy. Your first 1,000 routed requests are free.

  • Base URL: https://api.imarouter.com/v1
  • Compatible with OpenAI Python SDK, JS/TS SDK, and any HTTP client
  • Use model: 'auto' for policy-driven routing, or specify any supported model ID
  • Dashboard at api.imarouter.com for usage, logs, and policy configuration

FAQ

FAQ

Can I define my own fallback order?

Yes. You can specify a priority-ordered list of models in your routing policy. ImaRouter tries each in order when the preferred model is unavailable, rate-limited, or exceeds your latency or cost threshold.

Does routing add latency to my requests?

ImaRouter adds under 50ms to request routing. For most LLM use cases where model response time is 500ms–5s, this overhead is negligible. Routing decisions are made in memory using pre-evaluated policy rules.

Is the output format the same across all models?

Yes. All model responses are normalized to the OpenAI chat completion schema regardless of which provider handled the request. Your parsing and display code doesn't need to branch per provider.

Can I see which model handled each request?

Every response includes an x-routed-model header and the model field in the response body reflects the actual provider model used. The dashboard logs each request with full routing trace.

What happens if all my fallback models are unavailable?

ImaRouter returns a structured error response with the reason (all fallbacks exhausted, cost cap reached, etc.). You can configure a maximum retry window and a final error behavior — fail fast or queue for retry.

Launch paths

Related links and launch paths