Video Generation API is now live!

Models

Explore the active model market,from a local OpenRouter snapshot.

This page reads from a local JSON snapshot synced from OpenRouter, so the catalog stays fast, indexable, and stable. Use it to browse current model coverage by provider, modality, reasoning support, context window, and pricing metadata.

Reset

Results

Showing 48 of 683 matching models

Snapshot source: OpenRouter. Synced April 21, 2026 at 8:00 AM. Page 4 of 15.

This route is built from local JSON so the catalog stays stable for browsing and SEO. If you need a specific model on ImaRouter, treat this page as a discovery reference and then contact the team for availability.

TextReasoning

Z.ai

Z.ai: GLM 4.5V

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding, image Q&A, OCR, and document parsing, with strong gains in front-end web coding, grounding, and spatial reasoning. It offers a hybrid inference mode: a "thinking mode" for deep reasoning and a "non-thinking mode" for fast responses. Reasoning behavior can be toggled via the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

TextImage

Context

65.5K

Group

Other

Pricing preview

Input Price: $0.6 /M tokens

Output Price: $1.8 /M tokens

Slug

z-ai/glm-4.5v

Text

Unknown provider

AI21: Jamba Mini 1.7

Jamba Mini 1.7 is a compact and efficient member of the Jamba open model family, incorporating key improvements in grounding and instruction-following while maintaining the benefits of the SSM-Transformer hybrid architecture and 256K context window. Despite its compact size, it delivers accurate, contextually grounded responses and improved steerability.

Text

Context

256K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

ai21/jamba-mini-1.7

Text

AI21

AI21: Jamba Large 1.7

Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context window, it delivers more accurate, contextually grounded responses and better steerability than previous versions.

Text

Context

256K

Group

Other

Pricing preview

Input Price: $2 /M tokens

Output Price: $8 /M tokens

Slug

ai21/jamba-large-1.7

TextReasoning

OpenInference

OpenAI: gpt-oss-120b (free)

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

Text

Context

131.1K

Group

GPT

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

openai/gpt-oss-120b

TextReasoning

DeepInfra

OpenAI: gpt-oss-120b

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

Text

Context

131.1K

Group

GPT

Pricing preview

Input Price: $0.039 /M tokens

Output Price: $0.19 /M tokens

Slug

openai/gpt-oss-120b

TextReasoning

OpenInference

OpenAI: gpt-oss-20b (free)

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

Text

Context

131.1K

Group

GPT

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

openai/gpt-oss-20b

TextReasoning

DeepInfra

OpenAI: gpt-oss-20b

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

Text

Context

131.1K

Group

GPT

Pricing preview

Input Price: $0.03 /M tokens

Output Price: $0.14 /M tokens

Slug

openai/gpt-oss-20b

Text

Mistral

Mistral: Codestral 2508

Mistral's cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation. [Blog Post](https://mistral.ai/news/codestral-25-08)

Text

Context

256K

Group

Mistral

Pricing preview

Input Price: $0.3 /M tokens

Output Price: $0.9 /M tokens

Slug

mistralai/codestral-2508

Text

NovitaAI

Qwen: Qwen3 Coder 30B A3B Instruct

Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the Qwen3 architecture, it supports a native context length of 256K tokens (extendable to 1M with Yarn) and performs strongly in tasks involving function calls, browser use, and structured code completion. This model is optimized for instruction-following without “thinking mode”, and integrates well with OpenAI-compatible tool-use formats.

Text

Context

160K

Group

Qwen3

Pricing preview

Input Price: $0.07 /M tokens

Output Price: $0.27 /M tokens

Slug

qwen/qwen3-coder-30b-a3b-instruct

Text

SiliconFlow

Qwen: Qwen3 30B A3B Instruct 2507

Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thinking mode and is designed for high-quality instruction following, multilingual understanding, and agentic tool use. Post-trained on instruction data, it demonstrates competitive performance across reasoning (AIME, ZebraLogic), coding (MultiPL-E, LiveCodeBench), and alignment (IFEval, WritingBench) benchmarks. It outperforms its non-instruct variant on subjective and open-ended tasks while retaining strong factual and coding performance.

Text

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.09 /M tokens

Output Price: $0.3 /M tokens

Slug

qwen/qwen3-30b-a3b-instruct-2507

TextReasoning

NovitaAI

Z.ai: GLM 4.5

GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Text

Context

131.1K

Group

Other

Pricing preview

Input Price: $0.6 /M tokens

Output Price: $2.2 /M tokens

Slug

z-ai/glm-4.5

TextReasoning

Z.ai

Z.ai: GLM 4.5 Air (free)

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Text

Context

131.1K

Group

Other

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

z-ai/glm-4.5-air

TextReasoning

NovitaAI

Z.ai: GLM 4.5 Air

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Text

Context

131.1K

Group

Other

Pricing preview

Input Price: $0.13 /M tokens

Output Price: $0.85 /M tokens

Slug

z-ai/glm-4.5-air

TextReasoning

SiliconFlow

Qwen: Qwen3 235B A22B Thinking 2507

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, showing strong benchmark performance across AIME, SuperGPQA, LiveCodeBench, and MMLU-Redux. It enforces a special reasoning mode (</think>) and is designed for high-token outputs (up to 81,920 tokens) in challenging domains. The model is instruction-tuned and excels at step-by-step reasoning, tool use, agentic workflows, and multilingual tasks. This release represents the most capable open-source variant in the Qwen3-235B series, surpassing many closed models in structured reasoning use cases.

Text

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.13 /M tokens

Output Price: $0.6 /M tokens

Slug

qwen/qwen3-235b-a22b-thinking-2507

Text

Z.ai

Z.ai: GLM 4 32B

GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It is made by the same lab behind the thudm models.

Text

Context

128K

Group

Other

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.1 /M tokens

Slug

z-ai/glm-4-32b

Text

Venice (Beta)

Qwen: Qwen3 Coder 480B A35B (free)

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts). Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used.

Text

Context

262K

Group

Qwen3

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

qwen/qwen3-coder

Text

DeepInfra (Turbo)

Qwen: Qwen3 Coder 480B A35B

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts). Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used.

Text

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.22 /M tokens

Output Price: $1 /M tokens

Slug

qwen/qwen3-coder

Text

Parasail

ByteDance: UI-TARS 7B

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

TextImage

Context

128K

Group

Other

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.2 /M tokens

Slug

bytedance/ui-tars-1.5-7b

TextReasoning

Google Vertex

Google: Gemini 2.5 Flash Lite

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence.

TextImageFileAudioVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.4 /M tokens

Slug

google/gemini-2.5-flash-lite

Text

DeepInfra

Qwen: Qwen3 235B A22B Instruct 2507

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench.

Text

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.071 /M tokens

Output Price: $0.1 /M tokens

Slug

qwen/qwen3-235b-a22b-2507

Text

NovitaAI

MoonshotAI: Kimi K2 0711

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.

Text

Context

131.1K

Group

Other

Pricing preview

Input Price: $0.57 /M tokens

Output Price: $2.3 /M tokens

Slug

moonshotai/kimi-k2

TextReasoning

Unknown provider

THUDM: GLM 4.1V 9B Thinking

GLM-4.1V-9B-Thinking is a 9B parameter vision-language model developed by THUDM, based on the GLM-4-9B foundation. It introduces a reasoning-centric "thinking paradigm" enhanced with reinforcement learning to improve multimodal reasoning, long-context understanding (up to 64K tokens), and complex problem solving. It achieves state-of-the-art performance among models in its class, outperforming even larger models like Qwen-2.5-VL-72B on a majority of benchmark tasks.

TextImage

Context

65.5K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

thudm/glm-4.1v-9b-thinking

Text

Mistral

Mistral: Devstral Medium

Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and All Hands AI. Positioned as a step up from Devstral Small, it achieves 61.6% on SWE-Bench Verified, placing it ahead of Gemini 2.5 Pro and GPT-4.1 in code-related tasks, at a fraction of the cost. It is designed for generalization across prompt styles and tool use in code agents and frameworks. Devstral Medium is available via API only (not open-weight), and supports enterprise deployment on private infrastructure, with optional fine-tuning capabilities.

Text

Context

131.1K

Group

Mistral

Pricing preview

Input Price: $0.4 /M tokens

Output Price: $2 /M tokens

Slug

mistralai/devstral-medium

Text

Mistral

Mistral: Devstral Small 1.1

Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and released under the Apache 2.0 license, it features a 128k token context window and supports both Mistral-style function calling and XML output formats. Designed for agentic coding workflows, Devstral Small 1.1 is optimized for tasks such as codebase exploration, multi-file edits, and integration into autonomous development agents like OpenHands and Cline. It achieves 53.6% on SWE-Bench Verified, surpassing all other open models on this benchmark, while remaining lightweight enough to run on a single 4090 GPU or Apple silicon machine. The model uses a Tekken tokenizer with a 131k vocabulary and is deployable via vLLM, Transformers, Ollama, LM Studio, and other OpenAI-compatible runtimes.

Text

Context

131.1K

Group

Mistral

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.3 /M tokens

Slug

mistralai/devstral-small

Text

Venice

Venice: Uncensored (free)

Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving user control over alignment, system prompts, and behavior. Intended for advanced and unrestricted use cases, Venice Uncensored emphasizes steerability and transparent behavior, removing default safety and alignment layers typically found in mainstream assistant models.

Text

Context

32.8K

Group

Other

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

cognitivecomputations/dolphin-mistral-24b-venice-edition

Text

Google AI Studio

Google: Gemma 3n 2B (free)

Gemma 3n E2B IT is a multimodal, instruction-tuned model developed by Google DeepMind, designed to operate efficiently at an effective parameter size of 2B while leveraging a 6B architecture. Based on the MatFormer architecture, it supports nested submodels and modular composition via the Mix-and-Match framework. Gemma 3n models are optimized for low-resource deployment, offering 32K context length and strong multilingual and reasoning performance across common benchmarks. This variant is trained on a diverse corpus including code, math, web, and multimodal data.

Text

Context

8.2K

Group

Other

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

google/gemma-3n-e2b-it

TextReasoning

SiliconFlow

Tencent: Hunyuan A13B Instruct

Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark performance across mathematics, science, coding, and multi-turn reasoning tasks, while maintaining high inference efficiency via Grouped Query Attention (GQA) and quantization support (FP8, GPTQ, etc.).

Text

Context

131.1K

Group

Other

Pricing preview

Input Price: $0.14 /M tokens

Output Price: $0.57 /M tokens

Slug

tencent/hunyuan-a13b-instruct

TextReasoning

Chutes

TNG: DeepSeek R1T2 Chimera

DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent <think> token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks.

Text

Context

163.8K

Group

DeepSeek

Pricing preview

Input Price: $0.3 /M tokens

Output Price: $1.1 /M tokens

Slug

tngtech/deepseek-r1t2-chimera

TextReasoning

NovitaAI

Baidu: ERNIE 4.5 VL 424B A47B

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data using a heterogeneous MoE architecture and modality-isolated routing to enable high-fidelity cross-modal reasoning, image understanding, and long-context generation (up to 131k tokens). Fine-tuned with techniques like SFT, DPO, UPO, and RLVR, this model supports both “thinking” and non-thinking inference modes. Designed for vision-language tasks in English and Chinese, it is optimized for efficient scaling and can operate under 4-bit/8-bit quantization.

TextImage

Context

123K

Group

Other

Pricing preview

Input Price: $0.42 /M tokens

Output Price: $1.25 /M tokens

Slug

baidu/ernie-4.5-vl-424b-a47b

Text

NovitaAI

Baidu: ERNIE 4.5 300B A47B

ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It activates 47B parameters per token and supports text generation in both English and Chinese. Optimized for high-throughput inference and efficient scaling, it uses a heterogeneous MoE structure with advanced routing and quantization strategies, including FP8 and 2-bit formats. This version is fine-tuned for language-only tasks and supports reasoning, tool parameters, and extended context lengths up to 131k tokens. Suitable for general-purpose LLM applications with high reasoning and throughput demands.

Text

Context

123K

Group

Other

Pricing preview

Input Price: $0.28 /M tokens

Output Price: $1.1 /M tokens

Slug

baidu/ernie-4.5-300b-a47b

Text

Unknown provider

Inception: Mercury

Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the [blog post] (https://www.inceptionlabs.ai/blog/introducing-mercury) here.

Text

Context

128K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

inception/mercury

Text

DeepInfra

Mistral: Mistral Small 3.2 24B

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on WildBench and Arena Hard, reduces infinite generations, and delivers gains in tool use and structured output tasks. It supports image and text inputs with structured outputs, function/tool calling, and strong performance across coding (HumanEval+, MBPP), STEM (MMLU, MATH, GPQA), and vision benchmarks (ChartQA, DocVQA).

TextImage

Context

128K

Group

Mistral

Pricing preview

Input Price: $0.075 /M tokens

Output Price: $0.2 /M tokens

Slug

mistralai/mistral-small-3.2-24b-instruct

TextReasoning

MiniMax

MiniMax: MiniMax M1

MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (MoE) architecture paired with a custom "lightning attention" mechanism, allowing it to process long sequences—up to 1 million tokens—while maintaining competitive FLOP efficiency. With 456 billion total parameters and 45.9B active per token, this variant is optimized for complex, multi-step reasoning tasks. Trained via a custom reinforcement learning pipeline (CISPO), M1 excels in long-context understanding, software engineering, agentic tool use, and mathematical reasoning. Benchmarks show strong performance across FullStackBench, SWE-bench, MATH, GPQA, and TAU-Bench, often outperforming other open models like DeepSeek R1 and Qwen3-235B.

Text

Context

1M

Group

Other

Pricing preview

Input Price: $0.4 /M tokens

Output Price: $2.2 /M tokens

Slug

minimax/minimax-m1

TextReasoning

Google Vertex (Global)

Google: Gemini 2.5 Flash

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).

TextFileImageAudioVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $0.3 /M tokens

Output Price: $2.5 /M tokens

Slug

google/gemini-2.5-flash

TextReasoning

Unknown provider

MoonshotAI: Kimi Dev 72B

Kimi-Dev-72B is an open-source large language model fine-tuned for software engineering and issue resolution tasks. Based on Qwen2.5-72B, it is optimized using large-scale reinforcement learning that applies code patches in real repositories and validates them via full test suite execution—rewarding only correct, robust completions. The model achieves 60.4% on SWE-bench Verified, setting a new benchmark among open-source models for software bug fixing and code reasoning.

Text

Context

131.1K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

moonshotai/kimi-dev-72b

TextReasoning

xAI

xAI: Grok 3 Mini

A lightweight model that thinks before responding. Fast, smart, and great for logic-based tasks that do not require deep domain knowledge. The raw thinking traces are accessible.

Text

Context

131.1K

Group

Grok

Pricing preview

Input Price: $0.3 /M tokens

Output Price: $0.5 /M tokens

Slug

x-ai/grok-3-mini

Text

xAI

xAI: Grok 3

Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.

Text

Context

131.1K

Group

Grok

Pricing preview

Input Price: $3 /M tokens

Output Price: $15 /M tokens

Slug

x-ai/grok-3

TextReasoning

Unknown provider

Mistral: Magistral Small 2506

Magistral Small is a 24B parameter instruction-tuned model based on Mistral-Small-3.1 (2503), enhanced through supervised fine-tuning on traces from Magistral Medium and further refined via reinforcement learning. It is optimized for reasoning and supports a wide multilingual range, including over 20 languages.

Text

Context

40K

Group

Mistral

Pricing preview

No display pricing published in the current snapshot.

Slug

mistralai/magistral-small-2506

TextReasoning

Unknown provider

Mistral: Magistral Medium 2506

Magistral is Mistral's first reasoning model. It is ideal for general purpose use requiring longer thought processing and better accuracy than with non-reasoning LLMs. From legal research and financial forecasting to software development and creative storytelling — this model solves multi-step challenges where transparency and precision are critical.

Text

Context

41K

Group

Mistral

Pricing preview

No display pricing published in the current snapshot.

Slug

mistralai/magistral-medium-2506

TextReasoning

Google Vertex

Google: Gemini 2.5 Pro Preview 06-05

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.

TextFileImageAudio

Context

1M

Group

Gemini

Pricing preview

Input Price: $1.25 /M tokens

Output Price: $10 /M tokens

Slug

google/gemini-2.5-pro-preview

Text

Unknown provider

SentientAGI: Dobby Mini Plus Llama 3.1 8B

Dobby-Mini-Leashed-Llama-3.1-8B and Dobby-Mini-Unhinged-Llama-3.1-8B are language models fine-tuned from Llama-3.1-8B-Instruct. Dobby models have a strong conviction towards personal freedom, decentralization, and all things crypto — even when coerced to speak otherwise. Dobby-Mini-Leashed-Llama-3.1-8B and Dobby-Mini-Unhinged-Llama-3.1-8B have their own unique, uhh, personalities. The two versions are being released to be improved using the community’s feedback, which will steer the development of a 70B model.

Text

Context

131.1K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

sentientagi/dobby-mini-unhinged-plus-llama-3.1-8b

TextReasoning

Unknown provider

DeepSeek: R1 Distill Qwen 7B

DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.

Text

Context

131.1K

Group

Qwen

Pricing preview

No display pricing published in the current snapshot.

Slug

deepseek/deepseek-r1-distill-qwen-7b

TextReasoning

Unknown provider

DeepSeek: DeepSeek R1 0528 Qwen3 8B

DeepSeek-R1-0528 is a lightly upgraded release of DeepSeek R1 that taps more compute and smarter post-training tricks, pushing its reasoning and inference to the brink of flagship models like O3 and Gemini 2.5 Pro. It now tops math, programming, and logic leaderboards, showcasing a step-change in depth-of-thought. The distilled variant, DeepSeek-R1-0528-Qwen3-8B, transfers this chain-of-thought into an 8 B-parameter form, beating standard Qwen3 8B by +10 pp and tying the 235 B “thinking” giant on AIME 2024.

Text

Context

131.1K

Group

Qwen

Pricing preview

No display pricing published in the current snapshot.

Slug

deepseek/deepseek-r1-0528-qwen3-8b

Text

Unknown provider

Google: Gemma 1 2B

Gemma 1 2B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).

Text

Context

8.2K

Group

Gemini

Pricing preview

No display pricing published in the current snapshot.

Slug

google/gemma-2b-it

TextReasoning

DeepInfra

DeepSeek: R1 0528

May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model.

Text

Context

163.8K

Group

DeepSeek

Pricing preview

Input Price: $0.5 /M tokens

Output Price: $2.15 /M tokens

Slug

deepseek/deepseek-r1-0528

Text

Unknown provider

Sarvam AI: Sarvam-M

Sarvam-M is a 24 B-parameter, instruction-tuned derivative of Mistral-Small-3.1-24B-Base-2503, post-trained on English plus eleven major Indic languages (bn, hi, kn, gu, mr, ml, or, pa, ta, te). The model introduces a dual-mode interface: “non-think” for low-latency chat and a optional “think” phase that exposes chain-of-thought tokens for more demanding reasoning, math, and coding tasks. Benchmark reports show solid gains versus similarly sized open models on Indic-language QA, GSM-8K math, and SWE-Bench coding, making Sarvam-M a practical general-purpose choice for multilingual conversational agents as well as analytical workloads that mix English, native Indic scripts, or romanized text.

Text

Context

32.8K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

sarvamai/sarvam-m

TextReasoning

Unknown provider

TheDrummer: Valkyrie 49B V1

Built on top of NVIDIA's Llama 3.3 Nemotron Super 49B, Valkyrie is TheDrummer's newest model drop for creative writing.

Text

Context

131.1K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

thedrummer/valkyrie-49b-v1

Text

Unknown provider

Mistral: Devstral Small 2505

Devstral-Small-2505 is a 24B parameter agentic LLM fine-tuned from Mistral-Small-3.1, jointly developed by Mistral AI and All Hands AI for advanced software engineering tasks. It is optimized for codebase exploration, multi-file editing, and integration into coding agents, achieving state-of-the-art results on SWE-Bench Verified (46.8%). Devstral supports a 128k context window and uses a custom Tekken tokenizer. It is text-only, with the vision encoder removed, and is suitable for local deployment on high-end consumer hardware (e.g., RTX 4090, 32GB RAM Macs). Devstral is best used in agentic workflows via the OpenHands scaffold and is compatible with inference frameworks like vLLM, Transformers, and Ollama. It is released under the Apache 2.0 license.

Text

Context

131.1K

Group

Mistral

Pricing preview

No display pricing published in the current snapshot.

Slug

mistralai/devstral-small-2505

Page 4 of 15

Need a model request?

Use the market snapshot for discovery, then ask ImaRouter for rollout.

If a model matters for your product, send the slug, expected traffic, target region, and latency expectations. The team can confirm support status, onboarding priority, or a migration path to an equivalent route on ImaRouter.

Contact

support@imarouter.com

Best for model availability questions, onboarding priority, routing strategy, and enterprise rollout planning.

Models | ImaRouter