Video Generation API is now live!

Models

Explore the active model market,from a local OpenRouter snapshot.

This page reads from a local JSON snapshot synced from OpenRouter, so the catalog stays fast, indexable, and stable. Use it to browse current model coverage by provider, modality, reasoning support, context window, and pricing metadata.

Reset

Results

Showing 48 of 226 matching models

Snapshot source: OpenRouter. Synced April 21, 2026 at 8:00 AM. Page 4 of 5.

This route is built from local JSON so the catalog stays stable for browsing and SEO. If you need a specific model on ImaRouter, treat this page as a discovery reference and then contact the team for availability.

Text

Unknown provider

OpenAI: ChatGPT-4o

OpenAI ChatGPT 4o is continually updated by OpenAI to point to the current version of GPT-4o used by ChatGPT. It therefore differs slightly from the API version of [GPT-4o](/models/openai/gpt-4o) in that it has additional RLHF. It is intended for research and evaluation. OpenAI notes that this model is not suited for production use-cases as it may be removed or redirected to another model in the future.

TextImage

Context

128K

Group

GPT

Pricing preview

No display pricing published in the current snapshot.

Slug

openai/chatgpt-4o-latest

Text

Azure

OpenAI: GPT-4o (2024-08-06)

The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai.com/index/introducing-structured-outputs-in-the-api/). GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)

TextImageFile

Context

128K

Group

GPT

Pricing preview

Input Price: $2.5 /M tokens

Output Price: $10 /M tokens

Slug

openai/gpt-4o-2024-08-06

Text

Unknown provider

01.AI: Yi Vision

The Yi Vision is a complex visual task models provide high-performance understanding and analysis capabilities based on multiple images. It's ideal for scenarios that require analysis and interpretation of images and charts, such as image question answering, chart understanding, OCR, visual reasoning, education, research report understanding, or multilingual document reading.

TextImage

Context

16.4K

Group

Yi

Pricing preview

No display pricing published in the current snapshot.

Slug

01-ai/yi-vision

Text

Unknown provider

Google: Gemini 1.5 Pro Experimental

Gemini 1.5 Pro Experimental is a bleeding-edge version of the [Gemini 1.5 Pro](/models/google/gemini-pro-1.5) model. Because it's currently experimental, it will be **heavily rate-limited** by Google. Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). #multimodal

TextImage

Context

1M

Group

Gemini

Pricing preview

No display pricing published in the current snapshot.

Slug

google/gemini-pro-1.5-exp

Text

Azure

OpenAI: GPT-4o-mini

GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective. GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/). Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more. #multimodal

TextImageFile

Context

128K

Group

GPT

Pricing preview

Input Price: $0.15 /M tokens

Output Price: $0.6 /M tokens

Slug

openai/gpt-4o-mini

Text

OpenAI

OpenAI: GPT-4o-mini (2024-07-18)

GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective. GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/). Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more. #multimodal

TextImageFile

Context

128K

Group

GPT

Pricing preview

Input Price: $0.15 /M tokens

Output Price: $0.6 /M tokens

Slug

openai/gpt-4o-mini-2024-07-18

Text

Unknown provider

Anthropic: Claude 3.5 Sonnet (2024-06-20)

Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at: - Coding: Autonomously writes, edits, and runs code with reasoning and troubleshooting - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems) For the latest version (2024-10-23), check out [Claude 3.5 Sonnet](/anthropic/claude-3.5-sonnet). #multimodal

TextImageFile

Context

200K

Group

Claude

Pricing preview

No display pricing published in the current snapshot.

Slug

anthropic/claude-3.5-sonnet-20240620

Text

Unknown provider

Google: Gemini 1.5 Flash

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter. Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). #multimodal

TextImage

Context

1M

Group

Gemini

Pricing preview

No display pricing published in the current snapshot.

Slug

google/gemini-flash-1.5

Text

Azure

OpenAI: GPT-4o

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) #multimodal

TextImageFile

Context

128K

Group

GPT

Pricing preview

Input Price: $2.5 /M tokens

Output Price: $10 /M tokens

Slug

openai/gpt-4o

Text

Azure

OpenAI: GPT-4o (2024-05-13)

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) #multimodal

TextImageFile

Context

128K

Group

GPT

Pricing preview

Input Price: $5 /M tokens

Output Price: $15 /M tokens

Slug

openai/gpt-4o-2024-05-13

Text

Unknown provider

LLaVA v1.6 34B

LLaVA Yi 34B is an open-source model trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: [NousResearch/Nous-Hermes-2-Yi-34B](/models/nousresearch/nous-hermes-yi-34b) It was trained in December 2023.

TextImage

Context

4.1K

Group

Yi

Pricing preview

No display pricing published in the current snapshot.

Slug

liuhaotian/llava-yi-34b

Text

Unknown provider

Fireworks: FireLLaVA 13B

A blazing fast vision-language model, FireLLaVA quickly understands both text and images. It achieves impressive chat skills in tests, and was designed to mimic multimodal GPT-4. The first commercially permissive open source LLaVA model, trained entirely on open source LLM generated instruction following data.

TextImage

Context

4.1K

Group

Llama2

Pricing preview

No display pricing published in the current snapshot.

Slug

fireworks/firellava-13b

Text

Unknown provider

Google: Gemini 1.5 Pro

Google's latest multimodal model, supports image and video[0] in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). * [0]: Video input is not available through OpenRouter at this time.

TextImage

Context

2M

Group

Gemini

Pricing preview

No display pricing published in the current snapshot.

Slug

google/gemini-pro-1.5

Text

OpenAI

OpenAI: GPT-4 Turbo

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023.

TextImage

Context

128K

Group

GPT

Pricing preview

Input Price: $10 /M tokens

Output Price: $30 /M tokens

Slug

openai/gpt-4-turbo

Text

Amazon Bedrock

Anthropic: Claude 3 Haiku

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal

TextImage

Context

200K

Group

Claude

Pricing preview

Input Price: $0.25 /M tokens

Output Price: $1.25 /M tokens

Slug

anthropic/claude-3-haiku

Text

Unknown provider

Anthropic: Claude 3 Sonnet

Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal

TextImage

Context

200K

Group

Claude

Pricing preview

No display pricing published in the current snapshot.

Slug

anthropic/claude-3-sonnet

Text

Unknown provider

Anthropic: Claude 3 Opus

Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal

TextImage

Context

200K

Group

Claude

Pricing preview

No display pricing published in the current snapshot.

Slug

anthropic/claude-3-opus

Text

Unknown provider

Nous: Hermes 2 Vision 7B (alpha)

This vision-language model builds on innovations from the popular [OpenHermes-2.5](/models/teknium/openhermes-2.5-mistral-7b) model, by Teknium. It adds vision support, and is trained on a custom dataset enriched with function calling This project is led by [qnguyen3](https://twitter.com/stablequan) and [teknium](https://twitter.com/Teknium1). #multimodal

TextImage

Context

4.1K

Group

Mistral

Pricing preview

No display pricing published in the current snapshot.

Slug

nousresearch/nous-hermes-2-vision-7b

Text

Unknown provider

LLaVA 13B

LLaVA is a large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities and setting a new state-of-the-art accuracy on Science QA. #multimodal

TextImage

Context

2K

Group

Llama2

Pricing preview

No display pricing published in the current snapshot.

Slug

liuhaotian/llava-13b

Text

Unknown provider

OpenAI: GPT-4 Vision

Ability to understand images, in addition to all other [GPT-4 Turbo capabilties](/models/openai/gpt-4-turbo). Training data: up to Apr 2023. **Note:** heavily rate limited by OpenAI while in preview. #multimodal

TextImage

Context

128K

Group

GPT

Pricing preview

No display pricing published in the current snapshot.

Slug

openai/gpt-4-vision-preview

Text

Alibaba Cloud Int.

Qwen: Qwen3 VL 32B Instruct

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks.

TextImage

Context

131.1K

Group

Qwen

Pricing preview

Input Price: $0.104 /M tokens

Output Price: $0.416 /M tokens

Slug

qwen/qwen3-vl-32b-instruct

TextReasoning

Chutes

Qwen: Qwen3.5 397B A17B

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers state-of-the-art performance comparable to leading-edge models across a wide range of tasks, including language understanding, logical reasoning, code generation, agent-based tasks, image understanding, video understanding, and graphical user interface (GUI) interactions. With its robust code-generation and agent capabilities, the model exhibits strong generalization across diverse agent.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.39 /M tokens

Output Price: $2.34 /M tokens

Slug

qwen/qwen3.5-397b-a17b

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5-122B-A10B

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.26 /M tokens

Output Price: $2.08 /M tokens

Slug

qwen/qwen3.5-122b-a10b

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5-27B

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.195 /M tokens

Output Price: $1.56 /M tokens

Slug

qwen/qwen3.5-27b

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5-35B-A3B

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall performance is comparable to that of the Qwen3.5-27B.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.1625 /M tokens

Output Price: $1.3 /M tokens

Slug

qwen/qwen3.5-35b-a3b

TextReasoning

Unknown provider

Healer Alpha

Healer Alpha is a frontier omni-modal model with vision, hearing, reasoning, and action capabilities. It brings the full power of agentic intelligence into the real world: natively perceiving visual and audio inputs, reasoning across modalities, and executing complex multi-step tasks with precision and reliability. **Note:** All prompts and completions for this model are logged by the provider and may be used to improve the model.

TextImageAudioVideo

Context

262.1K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

openrouter/healer-alpha

TextReasoning

Unknown provider

Hunter Alpha

Hunter Alpha is a 1 Trillion parameter + 1M token context frontier intelligence model built for agentic use. It excels at long-horizon planning, complex reasoning, and sustained multi-step task execution, with the reliability and instruction-following precision that frameworks like OpenClaw need. **Note:** All prompts and completions for this model are logged by the provider and may be used to improve the model.

TextImage

Context

1M

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

openrouter/hunter-alpha

TextReasoning

Mistral

Mistral: Mistral Small 4

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from Magistral, multimodal understanding from Pixtral, and agentic coding capabilities from Devstral, enabling one model to handle complex analysis, software development, and visual tasks within the same workflow.

TextImage

Context

262.1K

Group

Mistral

Pricing preview

Input Price: $0.15 /M tokens

Output Price: $0.6 /M tokens

Slug

mistralai/mistral-small-2603

TextReasoning

Seed

ByteDance Seed: Seed-2.0-Lite

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across text, vision, and tools. Engineered for high-frequency visual understanding and agentic workflows, it's an ideal choice for deployment at scale with minimal latency.

TextImageVideo

Context

262.1K

Group

Other

Pricing preview

Input Price: $0.25 /M tokens

Output Price: $2 /M tokens

Slug

bytedance-seed/seed-2.0-lite

TextReasoning

Together

Qwen: Qwen3.5-9B

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design with early fusion of multimodal tokens, allowing the model to process and reason across text and images within the same context.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.15 /M tokens

Slug

qwen/qwen3.5-9b

ImageReasoning

Google AI Studio

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced contextual understanding with fast, cost-efficient inference, making complex image generation and iterative edits significantly more accessible. Aspect ratios can be controlled with the [image_config API Parameter](https://openrouter.ai/docs/features/multimodal/image-generation#image-aspect-ratio-configuration)

ImageText

Context

65.5K

Group

Gemini

Pricing preview

Input Price: $0.5 /M tokens

Output Price: $3 /M tokens

Slug

google/gemini-3.1-flash-image-preview

TextReasoning

Google AI Studio

Google: Gemini 3.1 Flash Lite Preview

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across key capabilities. Improvements span audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash.

TextImageVideoFileAudio

Context

1M

Group

Gemini

Pricing preview

Input Price: $0.25 /M tokens

Output Price: $1.5 /M tokens

Slug

google/gemini-3.1-flash-lite-preview

TextReasoning

NVIDIA

NVIDIA: Nemotron Nano 12B 2 VL (free)

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes.

TextImageVideo

Context

128K

Group

Other

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

nvidia/nemotron-nano-12b-v2-vl

TextReasoning

DeepInfra

NVIDIA: Nemotron Nano 12B 2 VL

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes.

TextImageVideo

Context

131.1K

Group

Other

Pricing preview

Input Price: $0.2 /M tokens

Output Price: $0.6 /M tokens

Slug

nvidia/nemotron-nano-12b-v2-vl

Embeddings

NVIDIA

NVIDIA: Llama Nemotron Embed VL 1B V2 (free)

The Llama Nemotron Embed VL 1B V2 embedding model is optimized for multimodal question-answering retrieval. The model can embed 'documents' in the form of image, text, or image and text combined. Documents can be retrieved given a user query in text form. The model supports images containing text, tables, charts, and infographics.

EmbeddingsTextImage

Context

131.1K

Group

Other

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

nvidia/llama-nemotron-embed-vl-1b-v2

TextReasoning

Seed

ByteDance Seed: Seed-2.0-Mini

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding, and is optimized for lightweight tasks where cost and speed take priority.

TextImageVideo

Context

262.1K

Group

Other

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.4 /M tokens

Slug

bytedance-seed/seed-2.0-mini

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5-Flash

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.

TextImageVideo

Context

1M

Group

Qwen3

Pricing preview

Input Price: $0.065 /M tokens

Output Price: $0.26 /M tokens

Slug

qwen/qwen3.5-flash-02-23

TextReasoning

Google AI Studio

Google: Gemini 3.1 Pro Preview Custom Tools

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party or user-defined functions are available. This specialized preview endpoint significantly increases function calling reliability and ensures the model selects the most appropriate tool in coding agents and complex, multi-tool workflows. It retains the core strengths of Gemini 3.1 Pro, including multimodal reasoning across text, image, video, audio, and code, a 1M-token context window, and strong software engineering performance.

TextAudioImageVideoFile

Context

1M

Group

Gemini

Pricing preview

Input Price: $2 /M tokens

Output Price: $12 /M tokens

Slug

google/gemini-3.1-pro-preview-customtools

TextReasoning

Google AI Studio

Google: Gemini 3.1 Pro Preview

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation of the Gemini 3 series, it combines high-precision reasoning across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning. The 3.1 update introduces measurable gains in SWE benchmarks and real-world coding environments, along with stronger autonomous task execution in structured domains such as finance and spreadsheet-based workflows. Designed for advanced development and agentic systems, Gemini 3.1 Pro Preview improves long-horizon stability and tool orchestration while increasing token efficiency. It introduces a new medium thinking level to better balance cost, speed, and performance. The model excels in agentic coding, structured planning, multimodal analysis, and workflow automation, making it well-suited for autonomous agents, financial modeling, spreadsheet automation, and high-context enterprise tasks.

TextAudioFileImageVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $2 /M tokens

Output Price: $12 /M tokens

Slug

google/gemini-3.1-pro-preview

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5 Plus 2026-02-15

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of task evaluations, the 3.5 series consistently demonstrates performance on par with state-of-the-art leading models. Compared to the 3 series, these models show a leap forward in both pure-text and multimodal capabilities.

TextImageVideo

Context

1M

Group

Qwen3

Pricing preview

Input Price: $0.26 /M tokens

Output Price: $1.56 /M tokens

Slug

qwen/qwen3.5-plus-02-15

Text

Unknown provider

Bert-Nebulon Alpha

This model was an early testing version of Mistral Large 3. Try the official launch of Mistral Large 3 [here](/mistralai/mistral-large-2512) This is a cloaked model provided to the community to gather feedback. A general-purpose multimodal model (text/image in, text out) designed for reliability, long-context comprehension, and adaptive logic. It is engineered for production-grade assistants, retrieval-augmented systems, science workloads, and complex agentic workflows. **Note:** All prompts and completions for this model are logged by the provider and may be used to improve the model.

TextImage

Context

256K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

openrouter/bert-nebulon-alpha

TextReasoning

SiliconFlow

Z.ai: GLM 4.6V

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.

TextImageVideo

Context

131.1K

Group

Other

Pricing preview

Input Price: $0.3 /M tokens

Output Price: $0.9 /M tokens

Slug

z-ai/glm-4.6v

Image

Black Forest Labs

Black Forest Labs: FLUX.2 Pro

A high-end image generation and editing model focused on frontier-level visual quality and reliability. It delivers strong prompt adherence, stable lighting, sharp textures, and consistent character/style reproduction across multi-reference inputs. Designed for production workloads, it balances speed and quality while supporting text-to-image and image editing up to 4 MP resolution. Pricing is as follows, [per the docs](https://bfl.ai/pricing?category=flux.2): Input: We charge $0.015 for each megapixel on the input (i.e. reference images for editing) Output: The first megapixel is charged $0.03 and then each subsequent MP will be charged $0.015.

ImageText

Context

46.9K

Group

Other

Pricing preview

Output Image: $0.03 per megapixel

Slug

black-forest-labs/flux.2-pro

Image

Black Forest Labs

Black Forest Labs: FLUX.2 Flex

FLUX.2 [flex] excels at rendering complex text, typography, and fine details, and supports multi-reference editing in the same unified architecture. Pricing is as follows, [per the docs](https://bfl.ai/pricing?category=flux.2): We charge $0.06 for each megapixel on both input and output side.

ImageText

Context

67.3K

Group

Other

Pricing preview

Input Image: $0.06 per megapixel

Output Image: $0.06 per megapixel

Slug

black-forest-labs/flux.2-flex

Image

Black Forest Labs

Black Forest Labs: FLUX.2 Max

FLUX.2 [max] is the new top-tier image model from Black Forest Labs, pushing image quality, prompt understanding, and editing consistency to the highest level yet. Pricing is as follows, [per the docs](https://bfl.ai/pricing?category=flux.2): Input: We charge $0.03 for each megapixel on the input (i.e. reference images for editing) Output: The first generated megapixel is charged $0.07. Each subsequent megapixel is charged $0.03.

ImageText

Context

46.9K

Group

Other

Pricing preview

Output Image: $0.07 per megapixel

Slug

black-forest-labs/flux.2-max

Image

Seed

ByteDance Seed: Seedream 4.5

Seedream 4.5 is the latest in-house image generation model developed by ByteDance. Compared with Seedream 4.0, it delivers comprehensive improvements, especially in editing consistency, including better preservation of subject details, lighting, and color tone. It also enhances portrait refinement and small-text rendering. The model’s multi-image composition capabilities have been significantly strengthened, and both reasoning performance and visual aesthetics continue to advance, enabling more accurate and artistically expressive image generation. Pricing is $0.04 per output image, regardless of size.

ImageText

Context

4.1K

Group

Other

Pricing preview

Image Output: $0.04 per image

Slug

bytedance-seed/seedream-4.5

Image

Sourceful

Sourceful: Riverflow V2 Fast Preview

Riverflow V2 Fast Preview is the fastest variant of Sourceful's Riverflow V2 preview lineup. This preview version exceeds the performance of Riverflow 1 Family and is Sourceful's first unified text-to-image and image-to-image model family. Pricing is $0.03 per output image, regardless of size. Sourceful imposes a 4.5MB request size limit, therefore it is highly recommended to pass image URLs instead of Base64 data.

ImageText

Context

8.2K

Group

Other

Pricing preview

Image Output: $0.03 per image

Slug

sourceful/riverflow-v2-fast-preview

Image

Sourceful

Sourceful: Riverflow V2 Standard Preview

Riverflow V2 Standard Preview is the standard variant of Sourceful's Riverflow V2 preview lineup. This preview version exceeds the performance of Riverflow 1 Family and is Sourceful's first unified text-to-image and image-to-image model family. Pricing is $0.035 per output image, regardless of size. Sourceful imposes a 4.5MB request size limit, therefore it is highly recommended to pass image URLs instead of Base64 data.

ImageText

Context

8.2K

Group

Other

Pricing preview

Image Output: $0.035 per image

Slug

sourceful/riverflow-v2-standard-preview

Page 4 of 5

Need a model request?

Use the market snapshot for discovery, then ask ImaRouter for rollout.

If a model matters for your product, send the slug, expected traffic, target region, and latency expectations. The team can confirm support status, onboarding priority, or a migration path to an equivalent route on ImaRouter.

Contact

support@imarouter.com

Best for model availability questions, onboarding priority, routing strategy, and enterprise rollout planning.

Models | ImaRouter