Video Generation API is now live!

Models

Explore the active model market,from a local OpenRouter snapshot.

This page reads from a local JSON snapshot synced from OpenRouter, so the catalog stays fast, indexable, and stable. Use it to browse current model coverage by provider, modality, reasoning support, context window, and pricing metadata.

Reset

Results

Showing 48 of 226 matching models

Snapshot source: OpenRouter. Synced April 21, 2026 at 8:00 AM. Page 2 of 5.

This route is built from local JSON so the catalog stays stable for browsing and SEO. If you need a specific model on ImaRouter, treat this page as a discovery reference and then contact the team for availability.

TextReasoning

OpenAI

OpenAI: o3

o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images.

TextImageFile

Context

200K

Group

GPT

Pricing preview

Input Price: $2 /M tokens

Output Price: $8 /M tokens

Slug

openai/o3

TextReasoning

OpenAI

OpenAI: o4 Mini

OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains. Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute.

TextImageFile

Context

200K

Group

GPT

Pricing preview

Input Price: $1.1 /M tokens

Output Price: $4.4 /M tokens

Slug

openai/o4-mini

Text

Azure

OpenAI: GPT-4.1

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

TextImageFile

Context

1M

Group

GPT

Pricing preview

Input Price: $2 /M tokens

Output Price: $8 /M tokens

Slug

openai/gpt-4.1

Text

OpenAI

OpenAI: GPT-4.1 Mini

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints.

TextImageFile

Context

1M

Group

GPT

Pricing preview

Input Price: $0.4 /M tokens

Output Price: $1.6 /M tokens

Slug

openai/gpt-4.1-mini

Text

OpenAI

OpenAI: GPT-4.1 Nano

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion.

TextImageFile

Context

1M

Group

GPT

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.4 /M tokens

Slug

openai/gpt-4.1-nano

TextReasoning

OpenAI

OpenAI: o1-pro

The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers.

TextImageFile

Context

200K

Group

GPT

Pricing preview

Input Price: $15 /M tokens

Output Price: $6 /M tokens

Slug

openai/o1-pro

Text

Unknown provider

OpenAI: GPT-4.5 (Preview)

GPT-4.5 (Preview) is a research preview of OpenAI’s latest language model, designed to advance capabilities in reasoning, creativity, and multi-turn conversation. It builds on previous iterations with improvements in world knowledge, contextual coherence, and the ability to follow user intent more effectively. The model demonstrates enhanced performance in tasks that require open-ended thinking, problem-solving, and communication. Early testing suggests it is better at generating nuanced responses, maintaining long-context coherence, and reducing hallucinations compared to earlier versions. This research preview is intended to help evaluate GPT-4.5’s strengths and limitations in real-world use cases as OpenAI continues to refine and develop future models. Read more at the [blog post here.](https://openai.com/index/introducing-gpt-4-5/)

TextImage

Context

128K

Group

GPT

Pricing preview

No display pricing published in the current snapshot.

Slug

openai/gpt-4.5-preview

TextReasoning

OpenAI

OpenAI: o1

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).

TextImageFile

Context

200K

Group

GPT

Pricing preview

Input Price: $15 /M tokens

Output Price: $60 /M tokens

Slug

openai/o1

Text

OpenAI

OpenAI: GPT-4o (2024-11-20)

The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded files, providing deeper insights & more thorough responses. GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

TextImageFile

Context

128K

Group

GPT

Pricing preview

Input Price: $2.5 /M tokens

Output Price: $10 /M tokens

Slug

openai/gpt-4o-2024-11-20

TextReasoning

Google Vertex

Anthropic: Claude Opus 4.1

Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for tasks involving research, data analysis, and tool-assisted reasoning.

TextImageFile

Context

200K

Group

Claude

Pricing preview

Input Price: $15 /M tokens

Output Price: $75 /M tokens

Slug

anthropic/claude-opus-4.1

TextReasoning

Google Vertex (Global)

Anthropic: Claude Sonnet 4

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%), Sonnet 4 balances capability and computational efficiency, making it suitable for a broad range of applications from routine coding tasks to complex software development projects. Key enhancements include improved autonomous codebase navigation, reduced error rates in agent-driven workflows, and increased reliability in following intricate instructions. Sonnet 4 is optimized for practical everyday use, providing advanced reasoning capabilities while maintaining efficiency and responsiveness in diverse internal and external scenarios. Read more at the [blog post here](https://www.anthropic.com/news/claude-4)

TextImageFile

Context

1M

Group

Claude

Pricing preview

Input Price: $3 /M tokens

Output Price: $15 /M tokens

Slug

anthropic/claude-sonnet-4

TextReasoning

Amazon Bedrock

Anthropic: Claude 3.7 Sonnet

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes. Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks. Read more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet)

TextImageFile

Context

200K

Group

Claude

Pricing preview

Input Price: $3 /M tokens

Output Price: $15 /M tokens

Slug

anthropic/claude-3.7-sonnet

TextReasoning

Google Vertex

Anthropic: Claude 3.7 Sonnet (thinking)

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes. Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks. Read more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet)

TextImageFile

Context

200K

Group

Claude

Pricing preview

Input Price: $3 /M tokens

Output Price: $15 /M tokens

Slug

anthropic/claude-3.7-sonnet

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.6 Plus

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers major gains in agentic coding, front-end development, and overall reasoning, with a significantly improved “vibe coding” experience. The model excels at complex tasks such as 3D scenes, games, and repository-level problem solving, achieving a 78.8 score on SWE-bench Verified. It represents a substantial leap in both pure-text and multimodal capabilities, performing at the level of leading state-of-the-art models.

TextImageVideo

Context

1M

Group

Qwen3

Pricing preview

Input Price: $0.325 /M tokens

Output Price: $1.95 /M tokens

Slug

qwen/qwen3.6-plus

TextReasoning

Unknown provider

xAI: Grok 4.20 Beta

Grok 4.20 Beta is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently precise and truthful responses. Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens)

TextImageFile

Context

2M

Group

Grok

Pricing preview

No display pricing published in the current snapshot.

Slug

x-ai/grok-4.20-beta

TextReasoning

xAI

xAI: Grok 4.20

Grok 4.20 is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently precise and truthful responses. Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens)

TextImageFile

Context

2M

Group

Grok

Pricing preview

Input Price: $2 /M tokens

Output Price: $6 /M tokens

Slug

x-ai/grok-4.20

TextReasoning

Google Vertex

Anthropic: Claude Sonnet 4.5

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence. The model is designed for extended autonomous operation, maintaining task continuity across sessions and providing fact-based progress tracking. Sonnet 4.5 also introduces stronger agentic capabilities, including improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With enhanced context tracking and awareness of token usage across tool calls, it is particularly well-suited for multi-context and long-running workflows. Use cases span software engineering, cybersecurity, financial analysis, research agents, and other domains requiring sustained reasoning and tool use.

TextImageFile

Context

1M

Group

Claude

Pricing preview

Input Price: $3 /M tokens

Output Price: $15 /M tokens

Slug

anthropic/claude-sonnet-4.5

TextReasoning

Google AI Studio

Google: Gemma 4 26B A4B (free)

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at a fraction of the compute cost. Supports multimodal input including text, images, and video (up to 60s at 1fps). Features a 256K token context window, native function calling, configurable thinking/reasoning mode, and structured output support. Released under Apache 2.0.

TextImageVideo

Context

262.1K

Group

Gemma

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

google/gemma-4-26b-a4b-it

TextReasoning

DeepInfra

Google: Gemma 4 26B A4B

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at a fraction of the compute cost. Supports multimodal input including text, images, and video (up to 60s at 1fps). Features a 256K token context window, native function calling, configurable thinking/reasoning mode, and structured output support. Released under Apache 2.0.

TextImageVideo

Context

262.1K

Group

Gemma

Pricing preview

Input Price: $0.08 /M tokens

Output Price: $0.35 /M tokens

Slug

google/gemma-4-26b-a4b-it

TextReasoning

Google AI Studio

Google: Gemma 4 31B (free)

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages. Strong on coding, reasoning, and document understanding tasks. Apache 2.0 license.

TextImageVideo

Context

262.1K

Group

Gemma

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

google/gemma-4-31b-it

TextReasoning

DeepInfra

Google: Gemma 4 31B

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages. Strong on coding, reasoning, and document understanding tasks. Apache 2.0 license.

TextImageVideo

Context

262.1K

Group

Gemma

Pricing preview

Input Price: $0.13 /M tokens

Output Price: $0.38 /M tokens

Slug

google/gemma-4-31b-it

TextReasoning

Z.ai

Z.ai: GLM 5V Turbo

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, complex coding, and task execution, and works seamlessly with agents to complete the full loop of “perceive → plan → execute“.

TextImageVideo

Context

202.8K

Group

Other

Pricing preview

Input Price: $1.2 /M tokens

Output Price: $4 /M tokens

Slug

z-ai/glm-5v-turbo

TextReasoning

Reka AI

Reka Edge

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding, video analysis, object detection, and agentic tool-use.

TextImageVideo

Context

16.4K

Group

Other

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.1 /M tokens

Slug

rekaai/reka-edge

TextReasoning

Unknown provider

xAI: Grok 4.20 Multi-Agent Beta

Grok 4.20 Multi-Agent Beta is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information across complex tasks. Reasoning effort behavior: - low / medium: 4 agents - high / xhigh: 16 agents

TextImageFile

Context

2M

Group

Grok

Pricing preview

No display pricing published in the current snapshot.

Slug

x-ai/grok-4.20-multi-agent-beta

TextReasoning

xAI

xAI: Grok 4.20 Multi-Agent

Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information across complex tasks. Reasoning effort behavior: - low / medium: 4 agents - high / xhigh: 16 agents

TextImageFile

Context

2M

Group

Grok

Pricing preview

Input Price: $2 /M tokens

Output Price: $6 /M tokens

Slug

x-ai/grok-4.20-multi-agent

TextReasoning

xAI

xAI: Grok 4

Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel tool calling, structured outputs, and both image and text inputs. Note that reasoning is not exposed, reasoning cannot be disabled, and the reasoning effort cannot be specified. Pricing increases once the total tokens in a given request is greater than 128k tokens. See more details on the [xAI docs](https://docs.x.ai/docs/models/grok-4-0709)

TextImageFile

Context

256K

Group

Grok

Pricing preview

Input Price: $3 /M tokens

Output Price: $15 /M tokens

Slug

x-ai/grok-4

TextReasoning

xAI

xAI: Grok 4 Fast

Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning. Read more about the model on xAI's [news post](http://x.ai/news/grok-4-fast). Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens)

TextImageFile

Context

2M

Group

Grok

Pricing preview

Input Price: $0.2 /M tokens

Output Price: $0.5 /M tokens

Slug

x-ai/grok-4-fast

TextReasoning

xAI

xAI: Grok 4.1 Fast

Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research. 2M context window. Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens)

TextImageFile

Context

2M

Group

Grok

Pricing preview

Input Price: $0.2 /M tokens

Output Price: $0.5 /M tokens

Slug

x-ai/grok-4.1-fast

Text

Google AI Studio

Google: Lyria 3 Pro Preview

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz stereo audio from text prompts or from images. These models deliver structural coherence, including vocals, timed lyrics, and full instrumental arrangements. Lyria 3 Pro can generate full-length songs with verses, choruses, bridges.

TextAudioImage

Context

1M

Group

Other

Pricing preview

Song Generation: $0.08 per song

Slug

google/lyria-3-pro-preview

Text

Google AI Studio

Google: Lyria 3 Clip Preview

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz stereo audio from text prompts or from images. These models deliver structural coherence, including vocals, timed lyrics, and full instrumental arrangements. Lyria 3 Clip can generate short clips, loops, previews.

TextAudioImage

Context

1M

Group

Other

Pricing preview

Song Generation: $0.04 per song

Slug

google/lyria-3-clip-preview

TextReasoning

Xiaomi

Xiaomi: MiMo-V2-Omni

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities, 256K context window.

TextAudioImageVideo

Context

262.1K

Group

Other

Pricing preview

Input Price: $0.4 /M tokens

Output Price: $2 /M tokens

Slug

xiaomi/mimo-v2-omni

Image

Google AI Studio

Google: Nano Banana (Gemini 2.5 Flash Image)

Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Aspect ratios can be controlled with the [image_config API Parameter](https://openrouter.ai/docs/features/multimodal/image-generation#image-aspect-ratio-configuration)

ImageText

Context

32.8K

Group

Gemini

Pricing preview

Input Price: $0.3 /M tokens

Output Price: $2.5 /M tokens

Slug

google/gemini-2.5-flash-image

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3 VL 30B A3B Thinking

Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.

TextImage

Context

131.1K

Group

Qwen3

Pricing preview

Input Price: $0.13 /M tokens

Output Price: $1.56 /M tokens

Slug

qwen/qwen3-vl-30b-a3b-thinking

Text

Alibaba Cloud Int.

Qwen: Qwen3 VL 30B A3B Instruct

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.

TextImage

Context

131.1K

Group

Qwen3

Pricing preview

Input Price: $0.13 /M tokens

Output Price: $0.52 /M tokens

Slug

qwen/qwen3-vl-30b-a3b-instruct

TextReasoning

Unknown provider

Google: Gemini 2.5 Flash Preview 09-2025

Gemini 2.5 Flash Preview September 2025 Checkpoint is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).

TextImageFileAudioVideo

Context

1M

Group

Gemini

Pricing preview

No display pricing published in the current snapshot.

Slug

google/gemini-2.5-flash-preview-09-2025

TextReasoning

Google Vertex

Google: Gemini 2.5 Flash Lite Preview 09-2025

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence.

TextImageFileAudioVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.4 /M tokens

Slug

google/gemini-2.5-flash-lite-preview-09-2025

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3 VL 235B A22B Thinking

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math. The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows, turning sketches or mockups into code and assisting with UI debugging, while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.

TextImage

Context

131.1K

Group

Qwen3

Pricing preview

Input Price: $0.26 /M tokens

Output Price: $2.6 /M tokens

Slug

qwen/qwen3-vl-235b-a22b-thinking

Text

DeepInfra

Qwen: Qwen3 VL 235B A22B Instruct

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.

TextImage

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.2 /M tokens

Output Price: $0.88 /M tokens

Slug

qwen/qwen3-vl-235b-a22b-instruct

Text

Unknown provider

OpenGVLab: InternVL3 78B

The InternVL3 series is an advanced multimodal large language model (MLLM). Compared to InternVL 2.5, InternVL3 demonstrates stronger multimodal perception and reasoning capabilities. In addition, InternVL3 is benchmarked against the Qwen2.5 Chat models, whose pre-trained base models serve as the initialization for its language component. Benefiting from Native Multimodal Pre-Training, the InternVL3 series surpasses the Qwen2.5 series in overall text performance.

TextImage

Context

N/A

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

opengvlab/internvl3-78b

TextReasoning

Unknown provider

Cogito V2 Preview Llama 109B

An instruction-tuned, hybrid-reasoning Mixture-of-Experts model built on Llama-4-Scout-17B-16E. Cogito v2 can answer directly or engage an extended “thinking” phase, with alignment guided by Iterated Distillation & Amplification (IDA). It targets coding, STEM, instruction following, and general helpfulness, with stronger multilingual, tool-calling, and reasoning performance than size-equivalent baselines. The model supports long-context use (up to 10M tokens) and standard Transformers workflows. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

TextImage

Context

131.1K

Group

Llama4

Pricing preview

No display pricing published in the current snapshot.

Slug

deepcogito/cogito-v2-preview-llama-109b-moe

TextReasoning

Unknown provider

StepFun: Step3

Step3 is a cutting-edge multimodal reasoning model—built on a Mixture-of-Experts architecture with 321B total parameters and 38B active. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators.

TextImage

Context

65.5K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

stepfun-ai/step3

Image

Unknown provider

Google: Gemini 2.5 Flash Image Preview (Nano Banana)

Gemini 2.5 Flash Image Preview, a.k.a. "Nano Banana," is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations.

ImageText

Context

32.8K

Group

Gemini

Pricing preview

No display pricing published in the current snapshot.

Slug

google/gemini-2.5-flash-image-preview

Text

Mistral

Mistral: Mistral Medium 3.1

Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8Ă— lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases. The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3.1 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments.

TextImage

Context

131.1K

Group

Mistral

Pricing preview

Input Price: $0.4 /M tokens

Output Price: $2 /M tokens

Slug

mistralai/mistral-medium-3.1

TextReasoning

NovitaAI

Baidu: ERNIE 4.5 VL 28B A3B

A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing. Built with scaling-efficient infrastructure for high-throughput training and inference, the model leverages advanced post-training techniques including SFT, DPO, and UPO for optimized performance, while supporting an impressive 131K context length and RLVR alignment for superior cross-modal reasoning and generation capabilities.

TextImage

Context

30K

Group

Other

Pricing preview

Input Price: $0.14 /M tokens

Output Price: $0.56 /M tokens

Slug

baidu/ernie-4.5-vl-28b-a3b

TextReasoning

Z.ai

Z.ai: GLM 4.5V

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding, image Q&A, OCR, and document parsing, with strong gains in front-end web coding, grounding, and spatial reasoning. It offers a hybrid inference mode: a "thinking mode" for deep reasoning and a "non-thinking mode" for fast responses. Reasoning behavior can be toggled via the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

TextImage

Context

65.5K

Group

Other

Pricing preview

Input Price: $0.6 /M tokens

Output Price: $1.8 /M tokens

Slug

z-ai/glm-4.5v

Text

Parasail

ByteDance: UI-TARS 7B

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

TextImage

Context

128K

Group

Other

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.2 /M tokens

Slug

bytedance/ui-tars-1.5-7b

TextReasoning

Google Vertex

Google: Gemini 2.5 Flash Lite

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence.

TextImageFileAudioVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.4 /M tokens

Slug

google/gemini-2.5-flash-lite

TextReasoning

Unknown provider

THUDM: GLM 4.1V 9B Thinking

GLM-4.1V-9B-Thinking is a 9B parameter vision-language model developed by THUDM, based on the GLM-4-9B foundation. It introduces a reasoning-centric "thinking paradigm" enhanced with reinforcement learning to improve multimodal reasoning, long-context understanding (up to 64K tokens), and complex problem solving. It achieves state-of-the-art performance among models in its class, outperforming even larger models like Qwen-2.5-VL-72B on a majority of benchmark tasks.

TextImage

Context

65.5K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

thudm/glm-4.1v-9b-thinking

Page 2 of 5

Need a model request?

Use the market snapshot for discovery, then ask ImaRouter for rollout.

If a model matters for your product, send the slug, expected traffic, target region, and latency expectations. The team can confirm support status, onboarding priority, or a migration path to an equivalent route on ImaRouter.

Contact

support@imarouter.com

Best for model availability questions, onboarding priority, routing strategy, and enterprise rollout planning.

Models | ImaRouter