Video Generation API is now live!

Models

Explore the active model market,from a local OpenRouter snapshot.

This page reads from a local JSON snapshot synced from OpenRouter, so the catalog stays fast, indexable, and stable. Use it to browse current model coverage by provider, modality, reasoning support, context window, and pricing metadata.

Reset

Results

Showing 47 of 47 matching models

Snapshot source: OpenRouter. Synced April 21, 2026 at 8:00 AM. Page 1 of 1.

This route is built from local JSON so the catalog stays stable for browsing and SEO. If you need a specific model on ImaRouter, treat this page as a discovery reference and then contact the team for availability.

Video

AtlasCloud

Kling: Video O1

Kling Video O1 is a video generation model from Kuaishou. It supports text and image inputs with video output, enabling text-to-video and image-to-video workflows. It is suited for cinematic content production, with first-frame and last-frame control for precise scene composition. It generates 5 or 10 second clips in 16:9, 9:16, or 1:1 aspect ratios.

VideoTextImage

Context

N/A

Group

Other

Pricing preview

Video Output: $0.0896 per second

Slug

kwaivgi/kling-video-o1

Video

Seed

ByteDance: Seedance 1.5 Pro

ByteDance's next-generation audio-visual generation model with a 4.5B parameter Dual-Branch Diffusion Transformer architecture. Seedance 1.5 Pro generates video and audio simultaneously in a single unified pass — eliminating the timing issues of sequential audio dubbing. Supports multi-language lip-sync (English, Mandarin, Japanese, Korean, Spanish, and more), cinematic camera control (pan, tilt, zoom, orbit), multi-character dialogue, and character consistency across shots. Produces clips from 4–12 seconds at up to 1080p. The number of tokens is given by (height of output video * width of output video * duration * 24) / 1024

VideoTextImage

Context

N/A

Group

Other

Pricing preview

Video Tokens (with audio): $2.4 /M tokens

Video Tokens (no audio): $1.2 /M tokens

Slug

bytedance/seedance-1-5-pro

Video

Seed

ByteDance: Seedance 2.0

Seedance 2.0 is a video generation model from ByteDance. It supports text-to-video, image-to-video with first and last frame control, and multimodal reference-to-video. It is particularly strong at preserving character consistency, visual style, and camera movement from reference material. The number of tokens is given by (height of output video * width of output video * duration * 24) / 1024

VideoTextImage

Context

N/A

Group

Other

Pricing preview

Video Tokens (with audio): $7 /M tokens

Video Tokens (no audio): $7 /M tokens

Slug

bytedance/seedance-2.0

Video

Seed

ByteDance: Seedance 2.0 Fast

Seedance 2.0 Fast is a video generation model from ByteDance. It supports text-to-video, image-to-video with first and last frame control, and multimodal reference-to-video. It prioritizes generation speed and lower cost over maximum output quality. The number of tokens is given by (height of output video * width of output video * duration * 24) / 1024

VideoTextImage

Context

N/A

Group

Other

Pricing preview

Video Tokens (with audio): $5.6 /M tokens

Video Tokens (no audio): $5.6 /M tokens

Slug

bytedance/seedance-2.0-fast

Video

OpenAI

OpenAI: Sora 2 Pro

OpenAI's flagship video generation model, delivering production-quality video with physics-accurate motion, synchronized audio, and world-state persistence across shots. Sora 2 Pro follows intricate multi-shot instructions while maintaining consistent spatial relationships — objects don't disappear or change shape between cuts. Supports text-to-video and image-to-video, with synchronized background soundscapes, speech, and sound effects. Includes advanced content safety with C2PA metadata provenance and SynthID-style watermarking.

VideoTextImage

Context

N/A

Group

Other

Pricing preview

Video Output: $0.3 per second

Slug

openai/sora-2-pro

Video

AtlasCloud

Alibaba: Wan 2.7

Wan 2.7 is a video generation model from Alibaba. It supports text-to-video, image-to-video with first and last frame control, and reference-to-video, where multiple reference images guide the style and content of the generated scene.

VideoTextImage

Context

N/A

Group

Other

Pricing preview

Video Output: $0.1 per second

Slug

alibaba/wan-2.7

TextReasoning

Google Vertex (Global)

Google: Gemini 2.5 Pro

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.

TextImageFileAudioVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $1.25 /M tokens

Output Price: $10 /M tokens

Slug

google/gemini-2.5-pro

Video

AtlasCloud

Alibaba: Wan 2.6

Alibaba's most advanced video generation model, supporting over 10 visual creation capabilities in a unified system. Wan 2.6 generates 1080p video at 24fps from text, images, reference videos, or audio, with native audio-visual synchronization and precise lip-sync. Key features include reference-to-video (insert a character's appearance and voice into new scenes), multi-shot storytelling from simple prompts, synchronized sound effects and music, and support for 16:9, 9:16, and 1:1 aspect ratios with clips up to 15 seconds.

VideoTextImage

Context

N/A

Group

Other

Pricing preview

Text to Video: $0.04 per second

Image to Video: $0.1 per second

Slug

alibaba/wan-2.6

Video

Google Vertex

Google: Veo 3.1

Google's state-of-the-art video generation model, built for maximum visual fidelity in final production cuts. Veo 3.1 generates high-quality 1080p video from text or image prompts with native synchronized audio — including dialogue, ambient effects, and background sound. Supports scene extension (up to 20 chained clips for 140+ second narratives), frames-to-video transitions between two images, vertical video for Shorts, and 4K upscaling.

VideoTextImage

Context

N/A

Group

Other

Pricing preview

Video (with audio): $0.4 per second

Video (no audio): $0.2 per second

Slug

google/veo-3.1

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.6 Plus

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers major gains in agentic coding, front-end development, and overall reasoning, with a significantly improved “vibe coding” experience. The model excels at complex tasks such as 3D scenes, games, and repository-level problem solving, achieving a 78.8 score on SWE-bench Verified. It represents a substantial leap in both pure-text and multimodal capabilities, performing at the level of leading state-of-the-art models.

TextImageVideo

Context

1M

Group

Qwen3

Pricing preview

Input Price: $0.325 /M tokens

Output Price: $1.95 /M tokens

Slug

qwen/qwen3.6-plus

TextReasoning

Google AI Studio

Google: Gemma 4 26B A4B (free)

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at a fraction of the compute cost. Supports multimodal input including text, images, and video (up to 60s at 1fps). Features a 256K token context window, native function calling, configurable thinking/reasoning mode, and structured output support. Released under Apache 2.0.

TextImageVideo

Context

262.1K

Group

Gemma

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

google/gemma-4-26b-a4b-it

TextReasoning

DeepInfra

Google: Gemma 4 26B A4B

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at a fraction of the compute cost. Supports multimodal input including text, images, and video (up to 60s at 1fps). Features a 256K token context window, native function calling, configurable thinking/reasoning mode, and structured output support. Released under Apache 2.0.

TextImageVideo

Context

262.1K

Group

Gemma

Pricing preview

Input Price: $0.08 /M tokens

Output Price: $0.35 /M tokens

Slug

google/gemma-4-26b-a4b-it

TextReasoning

Google AI Studio

Google: Gemma 4 31B (free)

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages. Strong on coding, reasoning, and document understanding tasks. Apache 2.0 license.

TextImageVideo

Context

262.1K

Group

Gemma

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

google/gemma-4-31b-it

TextReasoning

DeepInfra

Google: Gemma 4 31B

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages. Strong on coding, reasoning, and document understanding tasks. Apache 2.0 license.

TextImageVideo

Context

262.1K

Group

Gemma

Pricing preview

Input Price: $0.13 /M tokens

Output Price: $0.38 /M tokens

Slug

google/gemma-4-31b-it

TextReasoning

Z.ai

Z.ai: GLM 5V Turbo

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, complex coding, and task execution, and works seamlessly with agents to complete the full loop of “perceive → plan → execute“.

TextImageVideo

Context

202.8K

Group

Other

Pricing preview

Input Price: $1.2 /M tokens

Output Price: $4 /M tokens

Slug

z-ai/glm-5v-turbo

TextReasoning

Reka AI

Reka Edge

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding, video analysis, object detection, and agentic tool-use.

TextImageVideo

Context

16.4K

Group

Other

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.1 /M tokens

Slug

rekaai/reka-edge

TextReasoning

Xiaomi

Xiaomi: MiMo-V2-Omni

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities, 256K context window.

TextAudioImageVideo

Context

262.1K

Group

Other

Pricing preview

Input Price: $0.4 /M tokens

Output Price: $2 /M tokens

Slug

xiaomi/mimo-v2-omni

TextReasoning

Unknown provider

Google: Gemini 2.5 Flash Preview 09-2025

Gemini 2.5 Flash Preview September 2025 Checkpoint is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).

TextImageFileAudioVideo

Context

1M

Group

Gemini

Pricing preview

No display pricing published in the current snapshot.

Slug

google/gemini-2.5-flash-preview-09-2025

TextReasoning

Google Vertex

Google: Gemini 2.5 Flash Lite Preview 09-2025

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence.

TextImageFileAudioVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.4 /M tokens

Slug

google/gemini-2.5-flash-lite-preview-09-2025

TextReasoning

Google Vertex

Google: Gemini 2.5 Flash Lite

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence.

TextImageFileAudioVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.4 /M tokens

Slug

google/gemini-2.5-flash-lite

TextReasoning

Google Vertex (Global)

Google: Gemini 2.5 Flash

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).

TextFileImageAudioVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $0.3 /M tokens

Output Price: $2.5 /M tokens

Slug

google/gemini-2.5-flash

TextReasoning

Google Vertex

Google: Gemini 2.5 Pro Preview 05-06

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.

TextImageFileAudioVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $1.25 /M tokens

Output Price: $10 /M tokens

Slug

google/gemini-2.5-pro-preview-05-06

Text

Google Vertex

Google: Gemini 2.0 Flash Lite

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5), all at extremely economical token prices.

TextImageFileAudioVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $0.075 /M tokens

Output Price: $0.3 /M tokens

Slug

google/gemini-2.0-flash-lite-001

Text

Google Vertex

Google: Gemini 2.0 Flash

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.

TextImageFileAudioVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.4 /M tokens

Slug

google/gemini-2.0-flash-001

TextReasoning

Chutes

Qwen: Qwen3.5 397B A17B

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers state-of-the-art performance comparable to leading-edge models across a wide range of tasks, including language understanding, logical reasoning, code generation, agent-based tasks, image understanding, video understanding, and graphical user interface (GUI) interactions. With its robust code-generation and agent capabilities, the model exhibits strong generalization across diverse agent.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.39 /M tokens

Output Price: $2.34 /M tokens

Slug

qwen/qwen3.5-397b-a17b

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5-122B-A10B

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.26 /M tokens

Output Price: $2.08 /M tokens

Slug

qwen/qwen3.5-122b-a10b

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5-27B

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.195 /M tokens

Output Price: $1.56 /M tokens

Slug

qwen/qwen3.5-27b

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5-35B-A3B

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall performance is comparable to that of the Qwen3.5-27B.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.1625 /M tokens

Output Price: $1.3 /M tokens

Slug

qwen/qwen3.5-35b-a3b

TextReasoning

Unknown provider

Healer Alpha

Healer Alpha is a frontier omni-modal model with vision, hearing, reasoning, and action capabilities. It brings the full power of agentic intelligence into the real world: natively perceiving visual and audio inputs, reasoning across modalities, and executing complex multi-step tasks with precision and reliability. **Note:** All prompts and completions for this model are logged by the provider and may be used to improve the model.

TextImageAudioVideo

Context

262.1K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

openrouter/healer-alpha

TextReasoning

Seed

ByteDance Seed: Seed-2.0-Lite

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across text, vision, and tools. Engineered for high-frequency visual understanding and agentic workflows, it's an ideal choice for deployment at scale with minimal latency.

TextImageVideo

Context

262.1K

Group

Other

Pricing preview

Input Price: $0.25 /M tokens

Output Price: $2 /M tokens

Slug

bytedance-seed/seed-2.0-lite

TextReasoning

Together

Qwen: Qwen3.5-9B

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design with early fusion of multimodal tokens, allowing the model to process and reason across text and images within the same context.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.15 /M tokens

Slug

qwen/qwen3.5-9b

TextReasoning

Google AI Studio

Google: Gemini 3.1 Flash Lite Preview

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across key capabilities. Improvements span audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash.

TextImageVideoFileAudio

Context

1M

Group

Gemini

Pricing preview

Input Price: $0.25 /M tokens

Output Price: $1.5 /M tokens

Slug

google/gemini-3.1-flash-lite-preview

TextReasoning

NVIDIA

NVIDIA: Nemotron Nano 12B 2 VL (free)

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes.

TextImageVideo

Context

128K

Group

Other

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

nvidia/nemotron-nano-12b-v2-vl

TextReasoning

DeepInfra

NVIDIA: Nemotron Nano 12B 2 VL

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes.

TextImageVideo

Context

131.1K

Group

Other

Pricing preview

Input Price: $0.2 /M tokens

Output Price: $0.6 /M tokens

Slug

nvidia/nemotron-nano-12b-v2-vl

TextReasoning

Seed

ByteDance Seed: Seed-2.0-Mini

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding, and is optimized for lightweight tasks where cost and speed take priority.

TextImageVideo

Context

262.1K

Group

Other

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.4 /M tokens

Slug

bytedance-seed/seed-2.0-mini

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5-Flash

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.

TextImageVideo

Context

1M

Group

Qwen3

Pricing preview

Input Price: $0.065 /M tokens

Output Price: $0.26 /M tokens

Slug

qwen/qwen3.5-flash-02-23

TextReasoning

Google AI Studio

Google: Gemini 3.1 Pro Preview Custom Tools

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party or user-defined functions are available. This specialized preview endpoint significantly increases function calling reliability and ensures the model selects the most appropriate tool in coding agents and complex, multi-tool workflows. It retains the core strengths of Gemini 3.1 Pro, including multimodal reasoning across text, image, video, audio, and code, a 1M-token context window, and strong software engineering performance.

TextAudioImageVideoFile

Context

1M

Group

Gemini

Pricing preview

Input Price: $2 /M tokens

Output Price: $12 /M tokens

Slug

google/gemini-3.1-pro-preview-customtools

TextReasoning

Google AI Studio

Google: Gemini 3.1 Pro Preview

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation of the Gemini 3 series, it combines high-precision reasoning across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning. The 3.1 update introduces measurable gains in SWE benchmarks and real-world coding environments, along with stronger autonomous task execution in structured domains such as finance and spreadsheet-based workflows. Designed for advanced development and agentic systems, Gemini 3.1 Pro Preview improves long-horizon stability and tool orchestration while increasing token efficiency. It introduces a new medium thinking level to better balance cost, speed, and performance. The model excels in agentic coding, structured planning, multimodal analysis, and workflow automation, making it well-suited for autonomous agents, financial modeling, spreadsheet automation, and high-context enterprise tasks.

TextAudioFileImageVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $2 /M tokens

Output Price: $12 /M tokens

Slug

google/gemini-3.1-pro-preview

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5 Plus 2026-02-15

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of task evaluations, the 3.5 series consistently demonstrates performance on par with state-of-the-art leading models. Compared to the 3 series, these models show a leap forward in both pure-text and multimodal capabilities.

TextImageVideo

Context

1M

Group

Qwen3

Pricing preview

Input Price: $0.26 /M tokens

Output Price: $1.56 /M tokens

Slug

qwen/qwen3.5-plus-02-15

TextReasoning

SiliconFlow

Z.ai: GLM 4.6V

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.

TextImageVideo

Context

131.1K

Group

Other

Pricing preview

Input Price: $0.3 /M tokens

Output Price: $0.9 /M tokens

Slug

z-ai/glm-4.6v

Text

Unknown provider

AllenAI: Molmo2 8B

Molmo2-8B is an open vision-language model developed by the Allen Institute for AI (Ai2) as part of the Molmo2 family, supporting image, video, and multi-image understanding and grounding. It is based on Qwen3-8B and uses SigLIP 2 as its vision backbone, outperforming other open-weight, open-data models on short videos, counting, and captioning, while remaining competitive on long-video tasks.

TextImageVideo

Context

36.9K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

allenai/molmo-2-8b

Text

Unknown provider

Auto Router

Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output. To see which model was used, visit [Activity](/activity), or read the `model` attribute of the response. Your response will be priced at the same rate as the routed model. Learn more, including how to customize the models for routing, in our [docs](/docs/guides/routing/routers/auto-router). Requests will be routed to the following models: - [anthropic/claude-haiku-4.5](/anthropic/claude-haiku-4.5) - [anthropic/claude-opus-4.6](/anthropic/claude-opus-4.6) - [anthropic/claude-sonnet-4.5](/anthropic/claude-sonnet-4.5) - [anthropic/claude-sonnet-4.6](/anthropic/claude-sonnet-4.6) - [deepseek/deepseek-r1](/deepseek/deepseek-r1) - [google/gemini-2.5-flash-lite](/google/gemini-2.5-flash-lite) - [google/gemini-3-flash-preview](/google/gemini-3-flash-preview) - [google/gemini-3-pro-preview](/google/gemini-3-pro-preview) - [google/gemini-3.1-pro-preview](/google/gemini-3.1-pro-preview) - [meta-llama/llama-3.3-70b-instruct](/meta-llama/llama-3.3-70b-instruct) - [minimax/minimax-m2.5](/minimax/minimax-m2.5) - [mistralai/codestral-2508](/mistralai/codestral-2508) - [mistralai/mistral-7b-instruct-v0.1](/mistralai/mistral-7b-instruct-v0.1) - [mistralai/mistral-large](/mistralai/mistral-large) - [mistralai/mistral-medium-3.1](/mistralai/mistral-medium-3.1) - [mistralai/mistral-small-3.2-24b-instruct-2506](/mistralai/mistral-small-3.2-24b-instruct-2506) - [moonshotai/kimi-k2-thinking](/moonshotai/kimi-k2-thinking) - [openai/gpt-5](/openai/gpt-5) - [openai/gpt-5-mini](/openai/gpt-5-mini) - [openai/gpt-5-nano](/openai/gpt-5-nano) - [openai/gpt-5.1](/openai/gpt-5.1) - [openai/gpt-5.2](/openai/gpt-5.2) - [openai/gpt-5.2-pro](/openai/gpt-5.2-pro) - [openai/gpt-5.3-chat](/openai/gpt-5.3-chat) - [openai/gpt-oss-120b](/openai/gpt-oss-120b) - [perplexity/sonar](/perplexity/sonar) - [qwen/qwen3-235b-a22b](/qwen/qwen3-235b-a22b) - [x-ai/grok-3](/x-ai/grok-3) - [x-ai/grok-3-mini](/x-ai/grok-3-mini) - [x-ai/grok-4](/x-ai/grok-4) - [x-ai/grok-4.1-fast](/x-ai/grok-4.1-fast) - [z-ai/glm-5](/z-ai/glm-5)

TextImageAudioFileVideo

Context

2M

Group

Router

Pricing preview

No display pricing published in the current snapshot.

Slug

openrouter/auto

TextReasoning

Unknown provider

Google: Gemini 3 Pro Preview

Gemini 3 Pro is Google’s flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks. It delivers state-of-the-art benchmark results in general reasoning, STEM problem solving, factual QA, and multimodal understanding, including leading scores on LMArena, GPQA Diamond, MathArena Apex, MMMU-Pro, and Video-MMMU. Interactions emphasize depth and interpretability: the model is designed to infer intent with minimal prompting and produce direct, insight-focused responses. Built for advanced development and agentic workflows, Gemini 3 Pro provides robust tool-calling, long-horizon planning stability, and strong zero-shot generation for complex UI, visualization, and coding tasks. It excels at agentic coding (SWE-Bench Verified, Terminal-Bench 2.0), multimodal analysis, and structured long-form tasks such as research synthesis, planning, and interactive learning experiences. Suitable applications include autonomous agents, coding assistants, multimodal analytics, scientific reasoning, and high-context information processing.

TextImageFileAudioVideo

Context

1M

Group

Gemini

Pricing preview

No display pricing published in the current snapshot.

Slug

google/gemini-3-pro-preview

TextReasoning

Seed

ByteDance Seed: Seed 1.6 Flash

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of up to 16k tokens.

TextImageVideo

Context

262.1K

Group

Other

Pricing preview

Input Price: $0.075 /M tokens

Output Price: $0.3 /M tokens

Slug

bytedance-seed/seed-1.6-flash

TextReasoning

Seed

ByteDance Seed: Seed 1.6

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window.

TextImageVideo

Context

262.1K

Group

Other

Pricing preview

Input Price: $0.25 /M tokens

Output Price: $2 /M tokens

Slug

bytedance-seed/seed-1.6

TextReasoning

Google AI Studio

Google: Gemini 3 Flash Preview

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability. The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with text output. It includes configurable reasoning via thinking levels (minimal, low, medium, high), structured output, tool use, and automatic context caching. Gemini 3 Flash Preview is optimized for users who want strong reasoning and agentic behavior without the cost or latency of full scale frontier models.

TextImageFileAudioVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $0.5 /M tokens

Output Price: $3 /M tokens

Slug

google/gemini-3-flash-preview

TextReasoning

Amazon Bedrock

Amazon: Nova 2 Lite

Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing documents, extracting information from videos, generating code, providing accurate grounded answers, and automating multi-step agentic workflows.

TextImageVideoFile

Context

1M

Group

Nova

Pricing preview

Input Price: $0.3 /M tokens

Output Price: $2.5 /M tokens

Slug

amazon/nova-2-lite-v1

Need a model request?

Use the market snapshot for discovery, then ask ImaRouter for rollout.

If a model matters for your product, send the slug, expected traffic, target region, and latency expectations. The team can confirm support status, onboarding priority, or a migration path to an equivalent route on ImaRouter.

Contact

support@imarouter.com

Best for model availability questions, onboarding priority, routing strategy, and enterprise rollout planning.

Models | ImaRouter