Video Generation API is now live!

Models

Explore the active model market,from a local OpenRouter snapshot.

This page reads from a local JSON snapshot synced from OpenRouter, so the catalog stays fast, indexable, and stable. Use it to browse current model coverage by provider, modality, reasoning support, context window, and pricing metadata.

Reset

Results

Showing 48 of 683 matching models

Snapshot source: OpenRouter. Synced April 21, 2026 at 8:00 AM. Page 12 of 15.

This route is built from local JSON so the catalog stays stable for browsing and SEO. If you need a specific model on ImaRouter, treat this page as a discovery reference and then contact the team for availability.

Text

OpenAI

OpenAI: GPT-3.5 Turbo

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.

Text

Context

16.4K

Group

GPT

Pricing preview

Input Price: $0.5 /M tokens

Output Price: $1.5 /M tokens

Slug

openai/gpt-3.5-turbo

Embeddings

Perplexity

Perplexity: Embed V1 0.6B

pplx-embed-v1-0.6B is one of Perplexity's state-of-the-art text embedding models built for real-world, web-scale retrieval. pplx-embed-v1 is optimized for standard dense text retrieval with the 0.6B parameter model targeting lightweight, low-latency embedding generation.

EmbeddingsText

Context

32K

Group

Other

Pricing preview

Input Price: $0.004 /M tokens

Slug

perplexity/pplx-embed-v1-0.6b

Embeddings

Perplexity

Perplexity: Embed V1 4B

pplx-embed-v1 -4B is one of Perplexity's state-of-the-art text embedding models built for real-world, web-scale retrieval. pplx-embed-v1 is optimized for standard dense text retrieval with the 4B parameter model maximizing retrieval quality.

EmbeddingsText

Context

32K

Group

Other

Pricing preview

Input Price: $0.03 /M tokens

Slug

perplexity/pplx-embed-v1-4b

Text

Alibaba Cloud Int.

Qwen: Qwen3 VL 32B Instruct

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks.

TextImage

Context

131.1K

Group

Qwen

Pricing preview

Input Price: $0.104 /M tokens

Output Price: $0.416 /M tokens

Slug

qwen/qwen3-vl-32b-instruct

TextReasoning

Chutes

Qwen: Qwen3.5 397B A17B

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers state-of-the-art performance comparable to leading-edge models across a wide range of tasks, including language understanding, logical reasoning, code generation, agent-based tasks, image understanding, video understanding, and graphical user interface (GUI) interactions. With its robust code-generation and agent capabilities, the model exhibits strong generalization across diverse agent.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.39 /M tokens

Output Price: $2.34 /M tokens

Slug

qwen/qwen3.5-397b-a17b

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5-122B-A10B

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.26 /M tokens

Output Price: $2.08 /M tokens

Slug

qwen/qwen3.5-122b-a10b

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5-27B

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.195 /M tokens

Output Price: $1.56 /M tokens

Slug

qwen/qwen3.5-27b

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5-35B-A3B

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall performance is comparable to that of the Qwen3.5-27B.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.1625 /M tokens

Output Price: $1.3 /M tokens

Slug

qwen/qwen3.5-35b-a3b

TextReasoning

Unknown provider

Healer Alpha

Healer Alpha is a frontier omni-modal model with vision, hearing, reasoning, and action capabilities. It brings the full power of agentic intelligence into the real world: natively perceiving visual and audio inputs, reasoning across modalities, and executing complex multi-step tasks with precision and reliability. **Note:** All prompts and completions for this model are logged by the provider and may be used to improve the model.

TextImageAudioVideo

Context

262.1K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

openrouter/healer-alpha

TextReasoning

Unknown provider

Hunter Alpha

Hunter Alpha is a 1 Trillion parameter + 1M token context frontier intelligence model built for agentic use. It excels at long-horizon planning, complex reasoning, and sustained multi-step task execution, with the reliability and instruction-following precision that frameworks like OpenClaw need. **Note:** All prompts and completions for this model are logged by the provider and may be used to improve the model.

TextImage

Context

1M

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

openrouter/hunter-alpha

TextReasoning

NVIDIA

NVIDIA: Nemotron 3 Super (free)

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified. Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.

Text

Context

262.1K

Group

Other

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

nvidia/nemotron-3-super-120b-a12b

TextReasoning

DekaLLM

NVIDIA: Nemotron 3 Super

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified. Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.

Text

Context

262.1K

Group

Other

Pricing preview

Input Price: $0.09 /M tokens

Output Price: $0.45 /M tokens

Slug

nvidia/nemotron-3-super-120b-a12b

TextReasoning

OpenInference

MiniMax: MiniMax M2.5 (free)

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1 to extend into general office work, reaching fluency in generating and operating Word, Excel, and Powerpoint files, context switching between diverse software environments, and working across different agent and human teams. Scoring 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp, M2.5 is also more token efficient than previous generations, having been trained to optimize its actions and output through planning.

Text

Context

196.6K

Group

Other

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

minimax/minimax-m2.5

TextReasoning

Chutes

MiniMax: MiniMax M2.5

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1 to extend into general office work, reaching fluency in generating and operating Word, Excel, and Powerpoint files, context switching between diverse software environments, and working across different agent and human teams. Scoring 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp, M2.5 is also more token efficient than previous generations, having been trained to optimize its actions and output through planning.

Text

Context

196.6K

Group

Other

Pricing preview

Input Price: $0.15 /M tokens

Output Price: $1.2 /M tokens

Slug

minimax/minimax-m2.5

TextReasoning

Fireworks

MiniMax: MiniMax M2.7

MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent collaboration, enabling it to plan, execute, and refine complex tasks across dynamic environments. Trained for production-grade performance, M2.7 handles workflows such as live debugging, root cause analysis, financial modeling, and full document generation across Word, Excel, and PowerPoint. It delivers strong results on benchmarks including 56.2% on SWE-Pro and 57.0% on Terminal Bench 2, while achieving a 1495 ELO on GDPval-AA, setting a new standard for multi-agent systems operating in real-world digital workflows.

Text

Context

196.6K

Group

Other

Pricing preview

Input Price: $0.3 /M tokens

Output Price: $1.2 /M tokens

Slug

minimax/minimax-m2.7

TextReasoning

Mistral

Mistral: Mistral Small 4

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from Magistral, multimodal understanding from Pixtral, and agentic coding capabilities from Devstral, enabling one model to handle complex analysis, software development, and visual tasks within the same workflow.

TextImage

Context

262.1K

Group

Mistral

Pricing preview

Input Price: $0.15 /M tokens

Output Price: $0.6 /M tokens

Slug

mistralai/mistral-small-2603

TextReasoning

Z.ai

Z.ai: GLM 5 Turbo

GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments such as OpenClaw scenarios. It is deeply optimized for real-world agent workflows involving long execution chains, with improved complex instruction decomposition, tool use, scheduled and persistent execution, and overall stability across extended tasks.

Text

Context

202.8K

Group

Other

Pricing preview

Input Price: $1.2 /M tokens

Output Price: $4 /M tokens

Slug

z-ai/glm-5-turbo

TextReasoning

Seed

ByteDance Seed: Seed-2.0-Lite

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across text, vision, and tools. Engineered for high-frequency visual understanding and agentic workflows, it's an ideal choice for deployment at scale with minimal latency.

TextImageVideo

Context

262.1K

Group

Other

Pricing preview

Input Price: $0.25 /M tokens

Output Price: $2 /M tokens

Slug

bytedance-seed/seed-2.0-lite

TextReasoning

Together

Qwen: Qwen3.5-9B

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design with early fusion of multimodal tokens, allowing the model to process and reason across text and images within the same context.

TextImageVideo

Context

262.1K

Group

Qwen3

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.15 /M tokens

Slug

qwen/qwen3.5-9b

Text

Arcee (Prime Intellect)

Arcee AI: Trinity Large Preview (free)

Trinity-Large-Preview is a frontier-scale open-weight language model from Arcee, built as a 400B-parameter sparse Mixture-of-Experts with 13B active parameters per token using 4-of-256 expert routing. It excels in creative writing, storytelling, role-play, chat scenarios, and real-time voice assistance, better than your average reasoning model usually can. But we’re also introducing some of our newer agentic performance. It was trained to navigate well in agent harnesses like OpenCode, Cline, and Kilo Code, and to handle complex toolchains and long, constraint-filled prompts. The architecture natively supports very long context windows up to 512k tokens, with the Preview API currently served at 128k context using 8-bit quantization for practical deployment. Trinity-Large-Preview reflects Arcee’s efficiency-first design philosophy, offering a production-oriented frontier model with open weights and permissive licensing suitable for real-world applications and experimentation.

Text

Context

131K

Group

Other

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

arcee-ai/trinity-large-preview

TextReasoning

Inception

Inception: Mercury 2

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving >1,000 tokens/sec on standard GPUs. Mercury 2 is 5x+ faster than leading speed-optimized LLMs like Claude 4.5 Haiku and GPT 5 Mini, at a fraction of the cost. Mercury 2 supports tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output. Built for coding workflows where latency compounds, real-time voice/search, and agent loops. OpenAI API compatible. Read more in the [blog post](https://www.inceptionlabs.ai/blog/introducing-mercury-2).

Text

Context

128K

Group

Other

Pricing preview

Input Price: $0.25 /M tokens

Output Price: $0.75 /M tokens

Slug

inception/mercury-2

ImageReasoning

Google AI Studio

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced contextual understanding with fast, cost-efficient inference, making complex image generation and iterative edits significantly more accessible. Aspect ratios can be controlled with the [image_config API Parameter](https://openrouter.ai/docs/features/multimodal/image-generation#image-aspect-ratio-configuration)

ImageText

Context

65.5K

Group

Gemini

Pricing preview

Input Price: $0.5 /M tokens

Output Price: $3 /M tokens

Slug

google/gemini-3.1-flash-image-preview

TextReasoning

Google AI Studio

Google: Gemini 3.1 Flash Lite Preview

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across key capabilities. Improvements span audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash.

TextImageVideoFileAudio

Context

1M

Group

Gemini

Pricing preview

Input Price: $0.25 /M tokens

Output Price: $1.5 /M tokens

Slug

google/gemini-3.1-flash-lite-preview

TextReasoning

Baidu Qianfan

DeepSeek: DeepSeek V3.2

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Text

Context

131.1K

Group

DeepSeek

Pricing preview

Input Price: $0.252 /M tokens

Output Price: $0.378 /M tokens

Slug

deepseek/deepseek-v3.2

TextReasoning

Upstage

Upstage: Solar Pro 3

Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B total parameters and 12B active parameters per forward pass, it delivers exceptional performance while maintaining computational efficiency. Optimized for Korean with English and Japanese support.

Text

Context

128K

Group

Other

Pricing preview

Input Price: $0.15 /M tokens

Output Price: $0.6 /M tokens

Slug

upstage/solar-pro-3

Text

Together

LiquidAI: LFM2-24B-A2B

LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B parameter Mixture-of-Experts model with only 2B active parameters per token, it delivers high-quality generation while maintaining low inference costs. The model fits within 32 GB of RAM, making it practical to run on consumer laptops and desktops without sacrificing capability.

Text

Context

32.8K

Group

Other

Pricing preview

Input Price: $0.03 /M tokens

Output Price: $0.12 /M tokens

Slug

liquid/lfm-2-24b-a2b

TextReasoning

NVIDIA

NVIDIA: Nemotron Nano 12B 2 VL (free)

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes.

TextImageVideo

Context

128K

Group

Other

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

nvidia/nemotron-nano-12b-v2-vl

TextReasoning

DeepInfra

NVIDIA: Nemotron Nano 12B 2 VL

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes.

TextImageVideo

Context

131.1K

Group

Other

Pricing preview

Input Price: $0.2 /M tokens

Output Price: $0.6 /M tokens

Slug

nvidia/nemotron-nano-12b-v2-vl

TextReasoning

NVIDIA

NVIDIA: Nemotron 3 Nano 30B A3B (free)

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security.

Text

Context

256K

Group

Other

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

nvidia/nemotron-3-nano-30b-a3b

TextReasoning

DeepInfra

NVIDIA: Nemotron 3 Nano 30B A3B

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security.

Text

Context

262.1K

Group

Other

Pricing preview

Input Price: $0.05 /M tokens

Output Price: $0.2 /M tokens

Slug

nvidia/nemotron-3-nano-30b-a3b

Embeddings

NVIDIA

NVIDIA: Llama Nemotron Embed VL 1B V2 (free)

The Llama Nemotron Embed VL 1B V2 embedding model is optimized for multimodal question-answering retrieval. The model can embed 'documents' in the form of image, text, or image and text combined. Documents can be retrieved given a user query in text form. The model supports images containing text, tables, charts, and infographics.

EmbeddingsTextImage

Context

131.1K

Group

Other

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

nvidia/llama-nemotron-embed-vl-1b-v2

TextReasoning

Seed

ByteDance Seed: Seed-2.0-Mini

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding, and is optimized for lightweight tasks where cost and speed take priority.

TextImageVideo

Context

262.1K

Group

Other

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.4 /M tokens

Slug

bytedance-seed/seed-2.0-mini

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5-Flash

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.

TextImageVideo

Context

1M

Group

Qwen3

Pricing preview

Input Price: $0.065 /M tokens

Output Price: $0.26 /M tokens

Slug

qwen/qwen3.5-flash-02-23

TextReasoning

Google AI Studio

Google: Gemini 3.1 Pro Preview Custom Tools

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party or user-defined functions are available. This specialized preview endpoint significantly increases function calling reliability and ensures the model selects the most appropriate tool in coding agents and complex, multi-tool workflows. It retains the core strengths of Gemini 3.1 Pro, including multimodal reasoning across text, image, video, audio, and code, a 1M-token context window, and strong software engineering performance.

TextAudioImageVideoFile

Context

1M

Group

Gemini

Pricing preview

Input Price: $2 /M tokens

Output Price: $12 /M tokens

Slug

google/gemini-3.1-pro-preview-customtools

TextReasoning

AionLabs

AionLabs: Aion-2.0

Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging. It also handles mature and darker themes with more nuance and depth.

Text

Context

131.1K

Group

Other

Pricing preview

Input Price: $0.8 /M tokens

Output Price: $1.6 /M tokens

Slug

aion-labs/aion-2.0

TextReasoning

Unknown provider

Aurora Alpha

This is a cloaked model provided to the community to gather feedback. A reasoning model designed for speed. It is built for coding assistants, real-time conversational applications, and agentic workflows. Default reasoning effort is set to medium for fast responses. For agentic coding use cases, we recommend changing effort to high. Note: All prompts and completions for this model are logged by the provider and may be used to improve the model.

Text

Context

128K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

openrouter/aurora-alpha

TextReasoning

Google AI Studio

Google: Gemini 3.1 Pro Preview

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation of the Gemini 3 series, it combines high-precision reasoning across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning. The 3.1 update introduces measurable gains in SWE benchmarks and real-world coding environments, along with stronger autonomous task execution in structured domains such as finance and spreadsheet-based workflows. Designed for advanced development and agentic systems, Gemini 3.1 Pro Preview improves long-horizon stability and tool orchestration while increasing token efficiency. It introduces a new medium thinking level to better balance cost, speed, and performance. The model excels in agentic coding, structured planning, multimodal analysis, and workflow automation, making it well-suited for autonomous agents, financial modeling, spreadsheet automation, and high-context enterprise tasks.

TextAudioFileImageVideo

Context

1M

Group

Gemini

Pricing preview

Input Price: $2 /M tokens

Output Price: $12 /M tokens

Slug

google/gemini-3.1-pro-preview

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3.5 Plus 2026-02-15

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of task evaluations, the 3.5 series consistently demonstrates performance on par with state-of-the-art leading models. Compared to the 3 series, these models show a leap forward in both pure-text and multimodal capabilities.

TextImageVideo

Context

1M

Group

Qwen3

Pricing preview

Input Price: $0.26 /M tokens

Output Price: $1.56 /M tokens

Slug

qwen/qwen3.5-plus-02-15

Text

MiniMax

MiniMax: MiniMax M2-her

MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. Designed to stay consistent in tone and personality, it supports rich message roles (user_system, group, sample_message_user, sample_message_ai) and can learn from example dialogue to better match the style and pacing of your scenario, making it a strong choice for storytelling, companions, and conversational experiences where natural flow and vivid interaction matter most.

Text

Context

65.5K

Group

Other

Pricing preview

Input Price: $0.3 /M tokens

Output Price: $1.2 /M tokens

Slug

minimax/minimax-m2-her

TextReasoning

Ambient

Z.ai: GLM 5

GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading closed-source models. With advanced agentic planning, deep backend reasoning, and iterative self-correction, GLM-5 moves beyond code generation to full-system construction and autonomous execution.

Text

Context

80K

Group

Other

Pricing preview

Input Price: $0.72 /M tokens

Output Price: $2.3 /M tokens

Slug

z-ai/glm-5

Text

Unknown provider

Bert-Nebulon Alpha

This model was an early testing version of Mistral Large 3. Try the official launch of Mistral Large 3 [here](/mistralai/mistral-large-2512) This is a cloaked model provided to the community to gather feedback. A general-purpose multimodal model (text/image in, text out) designed for reliability, long-context comprehension, and adaptive logic. It is engineered for production-grade assistants, retrieval-augmented systems, science workloads, and complex agentic workflows. **Note:** All prompts and completions for this model are logged by the provider and may be used to improve the model.

TextImage

Context

256K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

openrouter/bert-nebulon-alpha

TextReasoning

Unknown provider

Pony Alpha

Pony is a cutting-edge foundation model with strong performance in coding, agentic workflows, reasoning, and roleplay, making it well suited for hands-on coding and real-world use. **Note:** All prompts and completions for this model are logged by the provider and may be used to improve the model.

Text

Context

200K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

openrouter/pony-alpha

TextReasoning

SiliconFlow

Z.ai: GLM 4.6V

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.

TextImageVideo

Context

131.1K

Group

Other

Pricing preview

Input Price: $0.3 /M tokens

Output Price: $0.9 /M tokens

Slug

z-ai/glm-4.6v

TextReasoning

DeepInfra

Z.ai: GLM 4.7 Flash

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning, and tool collaboration, and has achieved leading performance among open-source models of the same size on several current public benchmark leaderboards.

Text

Context

202.8K

Group

Other

Pricing preview

Input Price: $0.06 /M tokens

Output Price: $0.4 /M tokens

Slug

z-ai/glm-4.7-flash

TextReasoning

DekaLLM

Z.ai: GLM 4.7

GLM-4.7 is Z.ai’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while delivering more natural conversational experiences and superior front-end aesthetics.

Text

Context

202.8K

Group

Other

Pricing preview

Input Price: $0.38 /M tokens

Output Price: $1.74 /M tokens

Slug

z-ai/glm-4.7

TextReasoning

Alibaba Cloud Int.

Qwen: Qwen3 Max Thinking

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it delivers major gains in factual accuracy, complex reasoning, instruction following, alignment with human preferences, and agentic behavior.

Text

Context

262.1K

Group

Qwen

Pricing preview

Input Price: $0.78 /M tokens

Output Price: $3.9 /M tokens

Slug

qwen/qwen3-max-thinking

Image

Black Forest Labs

Black Forest Labs: FLUX.2 Pro

A high-end image generation and editing model focused on frontier-level visual quality and reliability. It delivers strong prompt adherence, stable lighting, sharp textures, and consistent character/style reproduction across multi-reference inputs. Designed for production workloads, it balances speed and quality while supporting text-to-image and image editing up to 4 MP resolution. Pricing is as follows, [per the docs](https://bfl.ai/pricing?category=flux.2): Input: We charge $0.015 for each megapixel on the input (i.e. reference images for editing) Output: The first megapixel is charged $0.03 and then each subsequent MP will be charged $0.015.

ImageText

Context

46.9K

Group

Other

Pricing preview

Output Image: $0.03 per megapixel

Slug

black-forest-labs/flux.2-pro

Image

Black Forest Labs

Black Forest Labs: FLUX.2 Flex

FLUX.2 [flex] excels at rendering complex text, typography, and fine details, and supports multi-reference editing in the same unified architecture. Pricing is as follows, [per the docs](https://bfl.ai/pricing?category=flux.2): We charge $0.06 for each megapixel on both input and output side.

ImageText

Context

67.3K

Group

Other

Pricing preview

Input Image: $0.06 per megapixel

Output Image: $0.06 per megapixel

Slug

black-forest-labs/flux.2-flex

Page 12 of 15

Need a model request?

Use the market snapshot for discovery, then ask ImaRouter for rollout.

If a model matters for your product, send the slug, expected traffic, target region, and latency expectations. The team can confirm support status, onboarding priority, or a migration path to an equivalent route on ImaRouter.

Contact

support@imarouter.com

Best for model availability questions, onboarding priority, routing strategy, and enterprise rollout planning.

Models | ImaRouter