Models

Explore the active model market,from a local OpenRouter snapshot.

This page reads from a local JSON snapshot synced from OpenRouter, so the catalog stays fast, indexable, and stable. Use it to browse current model coverage by provider, modality, reasoning support, context window, and pricing metadata.

Request a model on ImaRouter View source endpoint

Modality

Provider

Sort

Reasoning only

All (683)Audio (23)Embeddings (26)File (71)Image (226)Rerank (3)Text (683)TTS (2)Video (47)

Results

Showing 48 of 683 matching models

Snapshot source: OpenRouter. Synced April 21, 2026 at 8:00 AM. Page 6 of 15.

This route is built from local JSON so the catalog stays stable for browsing and SEO. If you need a specific model on ImaRouter, treat this page as a discovery reference and then contact the team for availability.

Text

Unknown provider

OpenHands LM 32B V0.1

OpenHands LM v0.1 is a 32B open-source coding model fine-tuned from Qwen2.5-Coder-32B-Instruct using reinforcement learning techniques outlined in SWE-Gym. It is optimized for autonomous software development agents and achieves strong performance on SWE-Bench Verified, with a 37.2% resolve rate. The model supports a 128K token context window, making it well-suited for long-horizon code reasoning and large codebase tasks. OpenHands LM is designed for local deployment and runs on consumer-grade GPUs such as a single 3090. It enables fully offline agent workflows without dependency on proprietary APIs. This release is intended as a research preview, and future updates aim to improve generalizability, reduce repetition, and offer smaller variants.

Text

Context

131.1K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

all-hands/openhands-lm-32b-v0.1

Text

Unknown provider

DeepSeek: DeepSeek V3 Base

Note that this is a base model mostly meant for testing, you need to provide detailed prompts for the model to return useful responses. DeepSeek-V3 Base is a 671B parameter open Mixture-of-Experts (MoE) language model with 37B active parameters per forward pass and a context length of 128K tokens. Trained on 14.8T tokens using FP8 mixed precision, it achieves high training efficiency and stability, with strong performance across language, reasoning, math, and coding tasks. DeepSeek-V3 Base is the pre-trained model behind [DeepSeek V3](/deepseek/deepseek-chat-v3)

Text

Context

131.1K

Group

DeepSeek

Pricing preview

No display pricing published in the current snapshot.

Slug

deepseek/deepseek-v3-base

Text

Unknown provider

Typhoon2 8B Instruct

Llama3.1-Typhoon2-8B-Instruct is a Thai-English instruction-tuned model with 8 billion parameters, built on Llama 3.1. It significantly improves over its base model in Thai reasoning, instruction-following, and function-calling tasks, while maintaining competitive English performance. The model is optimized for bilingual interaction and performs well on Thai-English code-switching, MT-Bench, IFEval, and tool-use benchmarks. Despite its smaller size, it demonstrates strong generalization across math, coding, and multilingual benchmarks, outperforming comparable 8B models across most Thai-specific tasks. Full benchmark results and methodology are available in the [technical report.](https://arxiv.org/abs/2412.13702)

Text

Context

8.2K

Group

Llama3

Pricing preview

No display pricing published in the current snapshot.

Slug

scb10x/llama3.1-typhoon2-8b-instruct

Text

Unknown provider

Typhoon2 70B Instruct

Llama3.1-Typhoon2-70B-Instruct is a Thai-English instruction-tuned language model with 70 billion parameters, built on Llama 3.1. It demonstrates strong performance across general instruction-following, math, coding, and tool-use tasks, with state-of-the-art results in Thai-specific benchmarks such as IFEval, MT-Bench, and Thai-English code-switching. The model excels in bilingual reasoning and function-calling scenarios, offering high accuracy across diverse domains. Comparative evaluations show consistent improvements over prior Thai LLMs and other Llama-based baselines. Full results and methodology are available in the [technical report.](https://arxiv.org/abs/2412.13702)

Text

Context

8.2K

Group

Llama3

Pricing preview

No display pricing published in the current snapshot.

Slug

scb10x/llama3.1-typhoon2-70b-instruct

Text

Unknown provider

Bytedance: UI-TARS 72B

UI-TARS 72B is an open-source multimodal AI model designed specifically for automating browser and desktop tasks through visual interaction and control. The model is built with a specialized vision architecture enabling accurate interpretation and manipulation of on-screen visual data. It supports automation tasks within web browsers as well as desktop applications, including Microsoft Office and VS Code. Core capabilities include intelligent screen detection, predictive action modeling, and efficient handling of repetitive interactions. UI-TARS employs supervised fine-tuning (SFT) tailored explicitly for computer control scenarios. It can be deployed locally or accessed via Hugging Face for demonstration purposes. Intended use cases encompass workflow automation, task scripting, and interactive desktop control applications.

TextImage

Context

32.8K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

bytedance-research/ui-tars-72b

Text

Unknown provider

Qwen: Qwen2.5 VL 3B Instruct

Qwen2.5 VL 3B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

TextImage

Context

64K

Group

Qwen

Pricing preview

No display pricing published in the current snapshot.

Slug

qwen/qwen2.5-vl-3b-instruct

Text

Unknown provider

Google: Gemini 2.5 Pro Experimental

This model has been deprecated by Google in favor of the (paid Preview model)[google/gemini-2.5-pro-preview] Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.

TextImageFile

Context

Group

Gemini

Pricing preview

No display pricing published in the current snapshot.

Slug

google/gemini-2.5-pro-exp-03-25

Text

Unknown provider

Qwen: Qwen2.5 VL 32B Instruct

Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation.

TextImage

Context

32.8K

Group

Qwen

Pricing preview

No display pricing published in the current snapshot.

Slug

qwen/qwen2.5-vl-32b-instruct

Text

DeepInfra

DeepSeek: DeepSeek V3 0324

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) model and performs really well on a variety of tasks.

Text

Context

163.8K

Group

DeepSeek

Pricing preview

Input Price: $0.2 /M tokens

Output Price: $0.77 /M tokens

Slug

deepseek/deepseek-chat-v3-0324

Text

Unknown provider

Qrwkv 72B

Qrwkv-72B is a linear-attention RWKV variant of the Qwen 2.5 72B model, optimized to significantly reduce computational cost at scale. Leveraging linear attention, it achieves substantial inference speedups (>1000x) while retaining competitive accuracy on common benchmarks like ARC, HellaSwag, Lambada, and MMLU. It inherits knowledge and language support from Qwen 2.5, supporting approximately 30 languages, making it suitable for efficient inference in large-context applications.

Text

Context

32.8K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

featherless/qwerky-72b

Text

Cloudflare

Mistral: Mistral Small 3.1 24B

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments. The updated version is [Mistral Small 3.2](mistralai/mistral-small-3.2-24b-instruct)

TextImage

Context

128K

Group

Mistral

Pricing preview

Input Price: $0.35 /M tokens

Output Price: $0.56 /M tokens

Slug

mistralai/mistral-small-3.1-24b-instruct

TextReasoning

Unknown provider

OlympicCoder 32B

OlympicCoder-32B is a high-performing open-source model fine-tuned using the CodeForces-CoTs dataset, containing approximately 100,000 chain-of-thought programming samples. It excels at complex competitive programming benchmarks, such as IOI 2024 and Codeforces-style challenges, frequently surpassing state-of-the-art closed-source models. OlympicCoder-32B provides advanced reasoning, coherent multi-step problem-solving, and robust code generation capabilities, demonstrating significant potential for olympiad-level competitive programming applications.

Text

Context

32.8K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

open-r1/olympiccoder-32b

TextReasoning

Unknown provider

SteelSkull: L3.3 Electra R1 70B

L3.3-Electra-R1-70 is the newest release of the Unnamed series. Built on a DeepSeek R1 Distill base, Electra-R1 integrates various models together to provide an intelligent and coherent model capable of providing deep character insights. Through proper prompting, the model demonstrates advanced reasoning capabilities and unprompted exploration of character inner thoughts and motivations. Read more about the model and [prompting here](https://huggingface.co/Steelskull/L3.3-Electra-R1-70b)

Text

Context

128K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

steelskull/l3.3-electra-r1-70b

Text

Unknown provider

AllenAI: Olmo 2 32B Instruct

OLMo-2 32B Instruct is a supervised instruction-finetuned variant of the OLMo-2 32B March 2025 base model. It excels in complex reasoning and instruction-following tasks across diverse benchmarks such as GSM8K, MATH, IFEval, and general NLP evaluation. Developed by AI2, OLMo-2 32B is part of an open, research-oriented initiative, trained primarily on English-language datasets to advance the understanding and development of open-source language models.

Text

Context

128K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

allenai/olmo-2-0325-32b-instruct

Text

Unknown provider

Google: Gemma 3 1B

Gemma 3 1B is the smallest of the new Gemma 3 family. It handles context windows up to 32k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Note: Gemma 3 1B is not multimodal. For the smallest multimodal Gemma 3 model, please see [Gemma 3 4B](google/gemma-3-4b-it)

TextImage

Context

32K

Group

Gemini

Pricing preview

No display pricing published in the current snapshot.

Slug

google/gemma-3-1b-it

Text

Google AI Studio

Google: Gemma 3 4B (free)

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling.

TextImage

Context

32.8K

Group

Gemini

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

google/gemma-3-4b-it

Text

DeepInfra

Google: Gemma 3 4B

TextImage

Context

131.1K

Group

Gemini

Pricing preview

Input Price: $0.04 /M tokens

Output Price: $0.08 /M tokens

Slug

google/gemma-3-4b-it

Text

Unknown provider

AI21: Jamba 1.6 Large

AI21 Jamba Large 1.6 is a high-performance hybrid foundation model combining State Space Models (Mamba) with Transformer attention mechanisms. Developed by AI21, it excels in extremely long-context handling (256K tokens), demonstrates superior inference efficiency (up to 2.5x faster than comparable models), and supports structured JSON output and tool-use capabilities. It has 94 billion active parameters (398 billion total), optimized quantization support (ExpertsInt8), and multilingual proficiency in languages such as English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, and Hebrew. Usage of this model is subject to the [Jamba Open Model License](https://www.ai21.com/licenses/jamba-open-model-license).

Text

Context

256K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

ai21/jamba-1.6-large

Text

Unknown provider

AI21: Jamba Mini 1.6

AI21 Jamba Mini 1.6 is a hybrid foundation model combining State Space Models (Mamba) with Transformer attention mechanisms. With 12 billion active parameters (52 billion total), this model excels in extremely long-context tasks (up to 256K tokens) and achieves superior inference efficiency, outperforming comparable open models on tasks such as retrieval-augmented generation (RAG) and grounded question answering. Jamba Mini 1.6 supports multilingual tasks across English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, and Hebrew, along with structured JSON output and tool-use capabilities. Usage of this model is subject to the [Jamba Open Model License](https://www.ai21.com/licenses/jamba-open-model-license).

Text

Context

256K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

ai21/jamba-1.6-mini

Text

Google AI Studio

Google: Gemma 3 12B (free)

TextImage

Context

32.8K

Group

Gemini

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

google/gemma-3-12b-it

Text

DeepInfra

Google: Gemma 3 12B

TextImage

Context

131.1K

Group

Gemini

Pricing preview

Input Price: $0.04 /M tokens

Output Price: $0.13 /M tokens

Slug

google/gemma-3-12b-it

Text

Cohere

Cohere: Command A

Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Compared to other leading proprietary and open-weights models Command A delivers maximum performance with minimum hardware costs, excelling on business-critical agentic and multilingual tasks.

Text

Context

256K

Group

Other

Pricing preview

Input Price: $2.5 /M tokens

Output Price: $10 /M tokens

Slug

cohere/command-a

Text

Google AI Studio

Google: Gemma 3 27B (free)

TextImage

Context

131.1K

Group

Gemini

Pricing preview

Input Price: $0 /M tokens

Output Price: $0 /M tokens

Slug

google/gemma-3-27b-it

Text

DeepInfra

Google: Gemma 3 27B

TextImage

Context

131.1K

Group

Gemini

Pricing preview

Input Price: $0.08 /M tokens

Output Price: $0.16 /M tokens

Slug

google/gemma-3-27b-it

Text

Unknown provider

LatitudeGames: Wayfarer Large 70B Llama 3.3

Wayfarer Large 70B is a roleplay and text-adventure model fine-tuned from Meta’s Llama-3.3-70B-Instruct. Specifically optimized for narrative-driven, challenging scenarios, it introduces realistic stakes, conflicts, and consequences often avoided by standard RLHF-aligned models. Trained using a curated blend of adventure, roleplay, and instructive fiction datasets, Wayfarer emphasizes tense storytelling, authentic player failure scenarios, and robust narrative immersion, making it uniquely suited for interactive fiction and gaming experiences.

Text

Context

128K

Group

Llama3

Pricing preview

No display pricing published in the current snapshot.

Slug

latitudegames/wayfarer-large-70b-llama-3.3

Text

Parasail

TheDrummer: Skyfall 36B V2

Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-playing, and coherent storytelling.

Text

Context

32.8K

Group

Other

Pricing preview

Input Price: $0.55 /M tokens

Output Price: $0.8 /M tokens

Slug

thedrummer/skyfall-36b-v2

Text

Unknown provider

Microsoft: Phi 4 Multimodal Instruct

Phi-4 Multimodal Instruct is a versatile 5.6B parameter foundation model that combines advanced reasoning and instruction-following capabilities across both text and visual inputs, providing accurate text outputs. The unified architecture enables efficient, low-latency inference, suitable for edge and mobile deployments. Phi-4 Multimodal Instruct supports text inputs in multiple languages including Arabic, Chinese, English, French, German, Japanese, Spanish, and more, with visual input optimized primarily for English. It delivers impressive performance on multimodal tasks involving mathematical, scientific, and document reasoning, providing developers and enterprises a powerful yet compact model for sophisticated interactive applications. For more information, see the [Phi-4 Multimodal blog post](https://azure.microsoft.com/en-us/blog/empowering-innovation-the-next-generation-of-the-phi-family/).

TextImage

Context

131.1K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

microsoft/phi-4-multimodal-instruct

TextReasoning

Unknown provider

DeepSeek: DeepSeek R1 Zero

DeepSeek-R1-Zero is a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. It's 671B parameters in size, with 37B active in an inference pass. It demonstrates remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. See [DeepSeek R1](/deepseek/deepseek-r1) for the SFT model.

Text

Context

163.8K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

deepseek/deepseek-r1-zero

TextReasoning

SiliconFlow

Qwen: QwQ 32B

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

Text

Context

131.1K

Group

Qwen

Pricing preview

Input Price: $0.15 /M tokens

Output Price: $0.58 /M tokens

Slug

qwen/qwq-32b

Text

Unknown provider

Qwen: Qwen2.5 32B Instruct

Qwen2.5 32B Instruct is the instruction-tuned variant of the latest Qwen large language model series. It provides enhanced instruction-following capabilities, improved proficiency in coding and mathematical reasoning, and robust handling of structured data and outputs such as JSON. It supports long-context processing up to 128K tokens and multilingual tasks across 29+ languages. The model has 32.5 billion parameters, 64 layers, and utilizes an advanced transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. For more details, please refer to the [Qwen2.5 Blog](https://qwenlm.github.io/blog/qwen2.5/) .

Text

Context

131.1K

Group

Qwen

Pricing preview

No display pricing published in the current snapshot.

Slug

qwen/qwen2.5-32b-instruct

Text

Unknown provider

MoonshotAI: Moonlight 16B A3B Instruct

Moonlight-16B-A3B-Instruct is a 16B-parameter Mixture-of-Experts (MoE) language model developed by Moonshot AI. It is optimized for instruction-following tasks with 3B activated parameters per inference. The model advances the Pareto frontier in performance per FLOP across English, coding, math, and Chinese benchmarks. It outperforms comparable models like Llama3-3B and Deepseek-v2-Lite while maintaining efficient deployment capabilities through Hugging Face integration and compatibility with popular inference engines like vLLM12.

Text

Context

8.2K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

moonshotai/moonlight-16b-a3b-instruct

Text

Unknown provider

Nous: DeepHermes 3 Llama 3 8B Preview

DeepHermes 3 Preview is the latest version of our flagship Hermes series of LLMs by Nous Research, and one of the first models in the world to unify Reasoning (long chains of thought that improve answer accuracy) and normal LLM response modes into one model. We have also improved LLM annotation, judgement, and function calling. DeepHermes 3 Preview is one of the first LLM models to unify both "intuitive", traditional mode responses and long chain of thought reasoning responses into a single model, toggled by a system prompt.

Text

Context

131.1K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

nousresearch/deephermes-3-llama-3-8b-preview

Text

Google Vertex

Google: Gemini 2.0 Flash Lite

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5), all at extremely economical token prices.

TextImageFileAudioVideo

Context

Group

Gemini

Pricing preview

Input Price: $0.075 /M tokens

Output Price: $0.3 /M tokens

Slug

google/gemini-2.0-flash-lite-001

Text

Mistral

Mistral: Saba

Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional datasets, it supports multiple Indian-origin languages—including Tamil and Malayalam—alongside Arabic. This makes it a versatile option for a range of regional and multilingual applications. Read more at the blog post [here](https://mistral.ai/en/news/mistral-saba)

Text

Context

32.8K

Group

Mistral

Pricing preview

Input Price: $0.2 /M tokens

Output Price: $0.6 /M tokens

Slug

mistralai/mistral-saba

TextReasoning

Unknown provider

Dolphin3.0 R1 Mistral 24B

Dolphin 3.0 R1 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases. The R1 version has been trained for 3 epochs to reason using 800k reasoning traces from the Dolphin-R1 dataset. Dolphin aims to be a general purpose reasoning instruct model, similar to the models behind ChatGPT, Claude, Gemini. Part of the [Dolphin 3.0 Collection](https://huggingface.co/collections/QuixiAI/dolphin-30) Curated and trained by [Eric Hartford](https://huggingface.co/ehartford), [Ben Gitter](https://huggingface.co/bigstorm), [BlouseJury](https://huggingface.co/BlouseJury) and [DphnAI](https://huggingface.co/dphn)

Text

Context

32.8K

Group

Mistral

Pricing preview

No display pricing published in the current snapshot.

Slug

cognitivecomputations/dolphin3.0-r1-mistral-24b

Text

Unknown provider

Dolphin3.0 Mistral 24B

Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases. Dolphin aims to be a general purpose instruct model, similar to the models behind ChatGPT, Claude, Gemini. Part of the [Dolphin 3.0 Collection](https://huggingface.co/collections/QuixiAI/dolphin-30) Curated and trained by [Eric Hartford](https://huggingface.co/ehartford), [Ben Gitter](https://huggingface.co/bigstorm), [BlouseJury](https://huggingface.co/BlouseJury) and [DphnAI](https://huggingface.co/dphn)

Text

Context

32.8K

Group

Mistral

Pricing preview

No display pricing published in the current snapshot.

Slug

cognitivecomputations/dolphin3.0-mistral-24b

Text

Cloudflare

Llama Guard 3 8B

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated. Llama Guard 3 was aligned to safeguard against the MLCommons standardized hazards taxonomy and designed to support Llama 3.1 capabilities. Specifically, it provides content moderation in 8 languages, and was optimized to support safety and security for search and code interpreter tool calls.

Text

Context

131.1K

Group

Llama3

Pricing preview

Input Price: $0.48 /M tokens

Output Price: $0.03 /M tokens

Slug

meta-llama/llama-guard-3-8b

Text

Unknown provider

Llama 3.1 Tulu 3 405B

Tülu 3 405B is the largest model in the Tülu 3 family, applying fully open post-training recipes at a 405B parameter scale. Built on the Llama 3.1 405B base, it leverages Reinforcement Learning with Verifiable Rewards (RLVR) to enhance instruction following, MATH, GSM8K, and IFEval performance. As part of Tülu 3’s fully open-source approach, it offers state-of-the-art capabilities while surpassing prior open-weight models like Llama 3.1 405B Instruct and Nous Hermes 3 405B on multiple benchmarks. To read more, [click here.](https://allenai.org/blog/tulu-3-405B)

Text

Context

N/A

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

allenai/llama-3.1-tulu-3-405b

TextReasoning

Unknown provider

DeepSeek: R1 Distill Llama 8B

DeepSeek R1 Distill Llama 8B is a distilled large language model based on [Llama-3.1-8B-Instruct](/meta-llama/llama-3.1-8b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 50.4 - MATH-500 pass@1: 89.1 - CodeForces Rating: 1205 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Hugging Face: - [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) - [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |

Text

Context

N/A

Group

Llama3

Pricing preview

No display pricing published in the current snapshot.

Slug

deepseek/deepseek-r1-distill-llama-8b

Text

Google Vertex

Google: Gemini 2.0 Flash

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.

TextImageFileAudioVideo

Context

Group

Gemini

Pricing preview

Input Price: $0.1 /M tokens

Output Price: $0.4 /M tokens

Slug

google/gemini-2.0-flash-001

Text

Alibaba Cloud Int.

Qwen: Qwen VL Plus

Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for image input. It delivers significant performance across a broad range of visual tasks.

TextImage

Context

131.1K

Group

Qwen

Pricing preview

Input Price: $0.1365 /M tokens

Output Price: $0.4095 /M tokens

Slug

qwen/qwen-vl-plus

Text

AionLabs

AionLabs: Aion-RP 1.0 (8B)

Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each other’s responses. It is a fine-tuned base model rather than an instruct model, designed to produce more natural and varied writing.

Text

Context

32.8K

Group

Other

Pricing preview

Input Price: $0.8 /M tokens

Output Price: $1.6 /M tokens

Slug

aion-labs/aion-rp-llama-3.1-8b

Text

Alibaba Cloud Int.

Qwen: Qwen VL Max

Qwen VL Max is a visual understanding model with 7500 tokens context length. It excels in delivering optimal performance for a broader spectrum of complex tasks.

TextImage

Context

131.1K

Group

Qwen

Pricing preview

Input Price: $0.52 /M tokens

Output Price: $2.08 /M tokens

Slug

qwen/qwen-vl-max

Text

Alibaba Cloud Int.

Qwen: Qwen-Turbo

Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks.

Text

Context

131.1K

Group

Qwen

Pricing preview

Input Price: $0.0325 /M tokens

Output Price: $0.13 /M tokens

Slug

qwen/qwen-turbo

Text

Nebius Token Factory

Qwen: Qwen2.5 VL 72B Instruct

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.

TextImage

Context

32K

Group

Qwen

Pricing preview

Input Price: $0.25 /M tokens

Output Price: $0.75 /M tokens

Slug

qwen/qwen2.5-vl-72b-instruct

Text

Alibaba Cloud Int.

Qwen: Qwen-Plus

Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.

Text

Context

Group

Qwen

Pricing preview

Input Price: $0.26 /M tokens

Output Price: $0.78 /M tokens

Slug

qwen/qwen-plus

Text

Alibaba Cloud Int.

Qwen: Qwen-Max

Qwen-Max, based on Qwen2.5, provides the best inference performance among [Qwen models](/qwen), especially for complex multi-step tasks. It's a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. The parameter count is unknown.

Text

Context

32.8K

Group

Qwen

Pricing preview

Input Price: $1.04 /M tokens

Output Price: $4.16 /M tokens

Slug

qwen/qwen-max

TextReasoning

Unknown provider

DeepSeek: R1 Distill Qwen 1.5B

DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on [Qwen 2.5 Math 1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It's a very small and efficient model which outperforms [GPT 4o 0513](/openai/gpt-4o-2024-05-13) on Math Benchmarks. Other benchmark results include: - AIME 2024 pass@1: 28.9 - AIME 2024 cons@64: 52.7 - MATH-500 pass@1: 83.9 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Text

Context

131.1K

Group

Other

Pricing preview

No display pricing published in the current snapshot.

Slug

deepseek/deepseek-r1-distill-qwen-1.5b

Page 6 of 15

Previous Next

Need a model request?

Use the market snapshot for discovery, then ask ImaRouter for rollout.

If a model matters for your product, send the slug, expected traffic, target region, and latency expectations. The team can confirm support status, onboarding priority, or a migration path to an equivalent route on ImaRouter.

Contact

support@imarouter.com

Best for model availability questions, onboarding priority, routing strategy, and enterprise rollout planning.

Contact ImaRouter Email the slug