AI Model Directory

Best for: Coding, agents & computer use

Anthropic: Claude Opus 4.8

Anthropic's May 28 2026 flagship and the new #1 on the Artificial Analysis Intelligence Index at 61.4 (+4.1 over Opus 4.7, +1.2 ahead of GPT-5.5). SWE-bench Verified 88.6%, SWE-bench Pro 69.2%, Terminal-Bench 2.1 74.6%, and a leading 49.8% on Humanity's Last Exam. Strongest computer-use/browser-agent model tested (Online-Mind2Web 84%). Same $5/$25 pricing as Opus 4.7; fast mode is ~2.5x faster and ~3x cheaper.

Quality

$15.00

Blended/1M

tok/s

Best for: Reasoning at any cost

OpenAI: GPT-5.5 Pro

High-compute variant of GPT-5.5 with extended thinking. 6x the price for mid-teens AAII uplift on hard reasoning workloads.

Quality

$105.00

Blended/1M

tok/s

OpenAI's most powerful reasoning model. State-of-the-art on MATH, coding, and science benchmarks.

Quality

$25.00

Blended/1M

tok/s

Best for: Long autonomous agentic runs

Qwen: Qwen3.7 Max

Alibaba's May 20 2026 proprietary flagship, unveiled at the Alibaba Cloud Summit in Hangzhou. Highest-ranked Chinese model on the Artificial Analysis Intelligence Index at 56.6 (#5 overall, +4.8 over Qwen 3.6 Max Preview). 1M context; SWE-bench Pro 60.6, Terminal-Bench 2.0 69.7, GPQA Diamond 92.4, and a table-leading 97.1 on HMMT Feb 2026 competition math. Ran 35 hours autonomously across 1,158 tool calls and supports external harnesses like Claude Code.

Quality

$5.00

Blended/1M

tok/s

Best for: Frontier quality at low cost

Moonshot AI's April 20 2026 frontier model. 256K context with text, image, and video input. Artificial Analysis Intelligence Index 54; SWE-bench Verified 80.2%, up sharply from K2.5.

Quality

$2.11

Blended/1M

tok/s

Best for: Multimodal + value

Google: Gemini 2.5 Pro

Google's thinking model with native tool use, 1M context window, and strong multimodal capabilities.

Quality

$5.63

Blended/1M

tok/s

Best for: Complex analysis

Anthropic: Claude Opus 4

Anthropic's most capable model. Excels at complex analysis, nuanced writing, and extended agentic tasks.

Quality

$45.00

Blended/1M

tok/s

TNG: DeepSeek R1T2 Chimera

DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The...

Quality

$0.70

Blended/1M

—

tok/s

Google: Gemini 2.5 Pro Preview 06-05

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

Quality

$5.63

Blended/1M

—

tok/s

DeepSeek: R1 0528

May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active...

Quality

$1.32

Blended/1M

—

tok/s

Best for: Open-weight agentic coding

Moonshot AI's June 2026 coding-specialized release. 1T-parameter Mixture-of-Experts with 32B active, 256K context, 400M-param MoonViT vision encoder, Modified MIT license. Roughly 30% fewer thinking tokens than K2.6 with improved long-horizon coding. Self-reported benchmarks edge close to GPT-5.5 and Opus 4.8 on coding and tool use; quality index provisional pending independent reproduction on released weights.

Quality

$2.11

Blended/1M

tok/s

NewOSS

Nex AGI: Nexus N2 Pro

Nex AGI's June 2026 frontier open model, trained from Qwen 3.5. 397B-parameter MoE with 17B active and adaptive reasoning (decides when to think harder). Reports beating DeepSeek V4 and GLM-5.1 across several agentic and coding benchmarks, with a standout deep-suite score. Self-reported; placement provisional.

Quality

$0.50

Blended/1M

tok/s

Best for: Open-weight reasoning & tool use

Best for: Open-source value leader

DeepSeek: DeepSeek V4 Pro

1.6T MoE / 49B active. Apache 2.0, 1M context. Best price-per-quality of any frontier-tier model by a wide margin — regular rate $1.74/$3.48, recently offered at a discounted launch-promo rate near $0.44/$0.87.

Quality

$2.61

Blended/1M

tok/s

Best for: Coding & balance

Anthropic: Claude Sonnet 4.6

Anthropic's Feb 17 2026 balanced model. Near-Opus performance at Sonnet pricing — SWE-bench Verified 79.6%, OSWorld 72.5%, 1M context. Best price-to-performance in the Claude lineup.

Quality

$9.00

Blended/1M

tok/s

OpenAI: GPT-5

GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy...

Quality

$5.63

Blended/1M

—

tok/s

xAI: Grok 3 Beta

Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in...

Quality

$9.00

Blended/1M

—

tok/s

Qwen: Qwen3.6 Max Preview

Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture with approximately 1 trillion total parameters. It is optimized for agentic coding, tool use, and...

Quality

$3.64

Blended/1M

—

tok/s

OpenAI: GPT-4.1

OpenAI's latest flagship with 1M token context, improved instruction following and coding.

Quality

$5.00

Blended/1M

120

tok/s

Best for: Long context

Moonshot AI

MoonshotAI: Kimi K2.5

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed...

Quality

$1.15

Blended/1M

—

tok/s

Best for: Open-weight agentic coding

MiniMax: MiniMax M3

MiniMax's June 1 2026 open-weight flagship. MiniMax Sparse Attention (MSA) with a 1M-token context, native multimodality, and agentic coding. Reported SWE-bench Pro 59.0% (ahead of GPT-5.5 and Gemini 3.1 Pro, approaching Opus 4.7), Terminal-Bench 2.1 66.0%, OSWorld-Verified 70.06%. Benchmarks were self-reported and unverified at launch; weights and technical report due roughly 10 days after release. Launch-promo pricing was $0.30/$1.20 per 1M.

Quality

$1.50

Blended/1M

tok/s

NewOSS

Z.ai: GLM 5.2

Z.ai's (Zhipu) June 2026 release. Available now on GLM coding plans, with open weights expected the following week. No system card, technical report, or benchmark suite published yet — quality index is a provisional placeholder above GLM 5.1 pending official numbers. Not yet independently evaluable.

Quality

$2.03

Blended/1M

—

tok/s

Best for: Open-weight agentic coding (provisional)

Quality

$12.00

Blended/1M

—

tok/s

Best for: Multimodal

Best for: Coding & balance

Anthropic's balanced model with excellent coding and reasoning. Best price-to-performance ratio.

Quality

$9.00

Blended/1M

tok/s

Best for: Reasoning & math

OpenAI: o3 Mini

OpenAI's compact reasoning model with extended thinking capabilities for complex problem solving.

Quality

$2.75

Blended/1M

155

tok/s

xAI: Grok 3

xAI's flagship model with strong reasoning and real-time information access. Trained on the Colossus cluster.

Quality

$9.00

Blended/1M

tok/s

Best for: Real-time info

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) model and performs really well...

Quality

$0.48

Blended/1M

—

tok/s

Anthropic: Claude 3.7 Sonnet

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and...

Quality

$9.00

Blended/1M

—

tok/s

Anthropic: Claude 3.7 Sonnet (thinking)

Quality

$9.00

Blended/1M

—

tok/s

Best for: Best open-source value

DeepSeek: DeepSeek V3

671B MoE model with 37B active parameters. Outstanding price-performance ratio and coding ability.

Quality

$0.69

Blended/1M

tok/s

This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/)....

Quality

$4.00

Blended/1M

—

tok/s

Google: Gemini 3.5 Flash

Gemini 3.5 Flash is Google's high-efficiency multimodal model, bringing near-Pro level coding and reasoning at Flash-tier cost and speed. It is highly optimized for coding proficiency and parallel agentic execution...

Quality

$5.25

Blended/1M

—

tok/s

Best for: Accessible open-weight agentics

NewOSS

Nex AGI: Nexus N2 mini

Smaller sibling of Nexus N2. 35B-parameter MoE with 3B active — an accessible ~70 GB footprint runnable without an exotic cluster. Same action-oriented adaptive-reasoning design as the Pro variant. Self-reported; placement provisional.

Quality

$0.13

Blended/1M

110

tok/s

Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and...

Quality

$1.14

Blended/1M

—

tok/s

OpenAI: o4 Mini High

OpenAI o4-mini-high is the same model as [o4-mini](/openai/o4-mini) with reasoning_effort set to high. OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining...

Quality

$2.75

Blended/1M

—

tok/s

OpenAI: o4 Mini

OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning...

Quality

$2.75

Blended/1M

—

tok/s

xAI: Grok 3 Mini Beta

Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It’s ideal for reasoning-heavy tasks that don’t demand...

Quality

$0.40

Blended/1M

—

tok/s

Best for: Open-source flagship

OpenAI: o3 Mini High

OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini) with reasoning_effort set to high. o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and...

Quality

$2.75

Blended/1M

—

tok/s

Alibaba's flagship open-source model. Competitive with GPT-4o class models on benchmarks at a fraction of the cost.

Quality

$0.60

Blended/1M

tok/s

Google: Gemini 3 Flash Preview

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool...

Quality

$1.75

Blended/1M

—

tok/s

Best for: Image generation

Google: Nano Banana (Gemini 2.5 Flash Image)

Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation,...

Quality

$1.40

Blended/1M

—

tok/s

Google: Gemini 2.5 Flash Lite Preview 09-2025

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...

Quality

$0.25

Blended/1M

—

tok/s

Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window. Tiered pricing kicks in...

Quality

$0.66

Blended/1M

—

tok/s

Mistral Large 2411

Mistral's flagship 123B model with strong multilingual and coding performance. Supports 128K context.

Quality

$4.00

Blended/1M

tok/s

Best for: Multilingual

Best for: Budget reasoning

xAI: Grok 3 Mini

xAI's efficient reasoning model with thinking capabilities at a lower cost point.

Quality

$0.40

Blended/1M

165

tok/s

Mistral: Codestral 2508

Mistral's cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation. [Blog Post](https://mistral.ai/news/codestral-25-08)

Quality

$0.60

Blended/1M

—

tok/s

Perplexity: Sonar Pro

Perplexity's search-augmented model. Combines LLM reasoning with real-time web search and citations.

Quality

$9.00

Blended/1M

tok/s

Google: Gemma 4 26B A4B

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

Quality

$0.20

Blended/1M

—

tok/s

Google: Gemma 4 31B

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

Quality

$0.24

Blended/1M

—

tok/s

Anthropic: Claude Haiku 4.5

Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance...

Quality

$3.00

Blended/1M

—

tok/s

Anthropic: Claude 3.5 Haiku

Anthropic's fastest model. Ultra-low latency for real-time applications and high-volume tasks.

Quality

$2.40

Blended/1M

172

tok/s

Z.ai: GLM 5V Turbo

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, complex coding,...

Quality

$2.60

Blended/1M

—

tok/s

xAI: Grok 4.20 Multi-Agent

Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information...

Quality

$4.00

Blended/1M

—

tok/s

Z.ai: GLM 5 Turbo

GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments such as OpenClaw scenarios. It is deeply optimized for real-world agent workflows...

Quality

$2.60

Blended/1M

—

tok/s

GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....

Quality

$5.63

Blended/1M

—

tok/s

AI21: Jamba Large 1.7

Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context...

Quality

$5.00

Blended/1M

—

tok/s

OpenAI: GPT-5 Chat

GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.

Quality

$5.63

Blended/1M

—

tok/s

xAI: Grok 4

Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel tool calling, structured outputs, and both image and text inputs. Note that reasoning is not...

Quality

$9.00

Blended/1M

—

tok/s

Google: Gemma 3 12B

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

Quality

$0.09

Blended/1M

—

tok/s

Cohere

Cohere: Command A

Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Compared to other leading proprietary...

Quality

$6.25

Blended/1M

—

tok/s

Google: Gemma 3 27B

Quality

$0.12

Blended/1M

—

tok/s

Perplexity: Sonar Reasoning Pro

Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) Sonar Reasoning Pro is a premier reasoning model powered by DeepSeek R1 with Chain of Thought (CoT). Designed for

Quality

$5.00

Blended/1M

—

tok/s

Perplexity: Sonar Deep Research

Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers...

Quality

$5.00

Blended/1M

—

tok/s

Best for: Deep research

Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images. The model is...

Quality

$4.00

Blended/1M

—

tok/s

Magnum v4 72B

This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus). The model is fine-tuned on top of [Qwen2.5 72B](https://openrouter.ai/qwen/qwen-

Quality

$4.00

Blended/1M

—

tok/s

NVIDIA: Llama 3.1 Nemotron 70B Instruct

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels...

Quality

$1.20

Blended/1M

—

tok/s

Inflection: Inflection 3 Pi

Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay. Pi...

Quality

$6.25

Blended/1M

—

tok/s

Inflection: Inflection 3 Productivity

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. It has access to recent news. For emotional...

Quality

$6.25

Blended/1M

—

tok/s

Nous: Hermes 3 70B Instruct

Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

Quality

$0.30

Blended/1M

—

tok/s

Meta: Llama 3.1 70B Instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong...

Quality

$0.40

Blended/1M

—

tok/s

Sao10k: Llama 3 Euryale 70B v2.1

Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and custom...

Quality

$1.48

Blended/1M

—

tok/s

OpenAI: GPT-3.5 Turbo Instruct

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021.

Quality

$1.75

Blended/1M

—

tok/s

OpenAI: GPT-3.5 Turbo 16k

This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up...

Quality

$3.50

Blended/1M

—

tok/s

Best for: Fastest + cheapest

Google: Gemini 2.0 Flash

Google's fastest model. Optimized for speed and efficiency with strong coding and reasoning.

Quality

$0.25

Blended/1M

244

tok/s

Best for: Open-source coding

Qwen2.5 Coder 32B Instruct

Specialized coding model from Alibaba. Top open-source code model on HumanEval and SWE-Bench.

Quality

$0.30

Blended/1M

125

tok/s

Google: Gemini 2.0 Flash Lite

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5),...

Quality

$0.19

Blended/1M

—

tok/s

Mistral Small Creative is an experimental small model designed for creative writing, narrative generation, roleplay and character-driven dialogue, general-purpose instruction following, and conversational agents.

Quality

$0.20

Blended/1M

—

tok/s

OpenAI: GPT-5 Nano

GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger...

Quality

$0.23

Blended/1M

—

tok/s

Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,...

Quality

$4.00

Blended/1M

—

tok/s

Anthropic: Claude 3 Haiku

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal

Quality

$0.75

Blended/1M

—

tok/s

Mistral: Mixtral 8x7B Instruct

Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion...

Quality

$0.54

Blended/1M

—

tok/s

Best for: Longest context

Meta: Llama 4 Scout

Meta's efficient MoE model with 16 experts. 10M token context window and strong multilingual support.

Quality

$0.28

Blended/1M

198

tok/s

Cohere

Cohere: Command R+ (08-2024)

Cohere's flagship for enterprise RAG. Optimized for retrieval-augmented generation and tool use.

Quality

$6.25

Blended/1M

tok/s

Best for: Enterprise RAG

Xiaomi: MiMo-V2-Omni

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

Quality

$1.20

Blended/1M

—

tok/s

Xiaomi: MiMo-V2-Pro

MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply optimized for agentic scenarios. It is highly adaptable to general agent frameworks like...

Quality

$2.00

Blended/1M

—

tok/s

Best for: Image generation

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines...

Quality

$1.75

Blended/1M

—

tok/s

GLM-4.7 is Z.ai’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while...

Quality

$1.07

Blended/1M

—

tok/s

Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex...

Quality

$1.08

Blended/1M

—

tok/s

Relace: Relace Apply 3

Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits straight into your source files. It can apply updates from GPT-4o, Claude, and others into your files at...

Quality

$1.05

Blended/1M

—

tok/s

Moonshot AI

MoonshotAI: Kimi K2 0905

Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32...

Quality

$1.55

Blended/1M

—

tok/s

Nous: Hermes 4 405B

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with...

Quality

$2.00

Blended/1M

—

tok/s

Mistral: Mistral Medium 3.1

Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances...

Quality

$1.20

Blended/1M

—

tok/s

Z.ai: GLM 4.5V

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding,...

Quality

$1.20

Blended/1M

—

tok/s

Z.ai: GLM 4.5

GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly...

Quality

$1.40

Blended/1M

—

tok/s

Switchpoint Router

Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library. As the world of LLMs advances, our router gets smarter, ensuring you...

Quality

$2.13

Blended/1M

—

tok/s

Moonshot AI

MoonshotAI: Kimi K2 0711

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for...

Quality

$1.43

Blended/1M

—

tok/s

Mistral: Devstral Medium

Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and All Hands AI. Positioned as a step up from Devstral Small, it achieves...

Quality

$1.20

Blended/1M

—

tok/s

Morph: Morph V3 Large

Morph's high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy for precise code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code>...

Quality

$1.40

Blended/1M

—

tok/s

Morph: Morph V3 Fast

Morph's fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code> <update>{edit_snippet}</update>...

Quality

$1.00

Blended/1M

—

tok/s

Baidu: ERNIE 4.5 VL 424B A47B

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data...

Quality

$0.83

Blended/1M

—

tok/s

MiniMax: MiniMax M1

MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (MoE) architecture paired with a custom "lightning attention" mechanism, allowing it...

Quality

$1.30

Blended/1M

—

tok/s

Mistral: Mistral Medium 3

Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost...

Quality

$1.20

Blended/1M

—

tok/s

Arcee AI: Maestro Reasoning

Maestro Reasoning is Arcee's flagship analysis model: a 32 B‑parameter derivative of Qwen 2.5‑32 B tuned with DPO and chain‑of‑thought RL for step‑by‑step logic. Compared to the earlier 7 B...

Quality

$2.10

Blended/1M

—

tok/s

Arcee AI: Virtuoso Large

Virtuoso‑Large is Arcee's top‑tier general‑purpose LLM at 72 B parameters, tuned to tackle cross‑domain reasoning, creative writing and enterprise QA. Unlike many 70 B peers, it retains the 128 k...

Quality

$0.97

Blended/1M

—

tok/s

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural...

Quality

$1.20

Blended/1M

—

tok/s

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.

Quality

$0.50

Blended/1M

—

tok/s

Perplexity: Sonar

Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features...

Quality

$1.00

Blended/1M

—

tok/s

Sao10K: Llama 3.3 Euryale 70B

Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.2](/models/sao10k/l3-euryale-70b).

Quality

$0.70

Blended/1M

—

tok/s

Amazon

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

Quality

$1.00

Blended/1M

—

tok/s

Meta: Llama 3 70B Instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...

Quality

$0.63

Blended/1M

—

tok/s

OpenAI: GPT-3.5 Turbo (older v0613)

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.

Quality

$1.50

Blended/1M

—

tok/s

Mancer: Weaver (alpha)

An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in roleplay/narrative situations.

Quality

$0.88

Blended/1M

—

tok/s

ReMM SLERP 13B

A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge

Quality

$0.55

Blended/1M

—

tok/s

MiMo-V2.5-Pro is Xiaomi’s flagship model, delivering strong performance in general agentic capabilities, complex software engineering, and long-horizon tasks, with top rankings on benchmarks such as ClawEval, GDPVal, and SWE-bench Pro....

Quality

$2.00

Blended/1M

—

tok/s

Google: Gemma 3 4B

Quality

$0.06

Blended/1M

—

tok/s

Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class. Designed for a wide variety of...

Quality

$0.06

Blended/1M

—

tok/s

NousResearch: Hermes 2 Pro - Llama-3 8B

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced...

Quality

$0.14

Blended/1M

—

tok/s

Meta: Llama 3 8B Instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...

Quality

$0.04

Blended/1M

—

tok/s

Google: Gemini 3.1 Flash Lite Preview

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across...

Quality

$0.88

Blended/1M

—

tok/s

NVIDIA: Nemotron 3 Nano 30B A3B

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...

Quality

$0.13

Blended/1M

—

tok/s

NVIDIA: Nemotron Nano 12B 2 VL

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s...

Quality

$0.40

Blended/1M

—

tok/s

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is...

Quality

$0.62

Blended/1M

—

tok/s

Arcee AI: Trinity Large Thinking

Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee AI. It shows strong performance in PinchBench, agentic workloads, and reasoning tasks. Launch video: https://youtu.be/Gc82AXLa0Rg?si=4RLn6WBz33qT--B7

Quality

$0.54

Blended/1M

—

tok/s

Kwaipilot: KAT-Coder-Pro V2

KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed for complex enterprise-grade software engineering and SaaS integration. It builds on the agentic coding strengths of earlier versions,...

Quality

$0.75

Blended/1M

—

tok/s

Reka Edge

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding,...

Quality

$0.10

Blended/1M

—

tok/s

MiniMax: MiniMax M2.7

MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent...

Quality

$0.74

Blended/1M

—

tok/s

NVIDIA: Nemotron 3 Super

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

Quality

$0.27

Blended/1M

—

tok/s

ByteDance Seed: Seed-2.0-Lite

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across...

Quality

$1.13

Blended/1M

—

tok/s

Inception: Mercury 2

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving...

Quality

$0.50

Blended/1M

—

tok/s

ByteDance Seed: Seed-2.0-Mini

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal und

Quality

$0.25

Blended/1M

—

tok/s

MiniMax: MiniMax M2.5

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...

Quality

$0.65

Blended/1M

—

tok/s

StepFun: Step 3.5 Flash

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token....

Quality

$0.20

Blended/1M

—

tok/s

Olmo 3.1 32B Instruct is a large-scale, 32-billion-parameter instruction-tuned language model engineered for high-performance conversational AI, multi-turn dialogue, and practical instruction following. As part of the Olmo 3.1 family, this...

Quality

$0.40

Blended/1M

—

tok/s

ByteDance Seed: Seed 1.6 Flash

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of...

Quality

$0.19

Blended/1M

—

tok/s

ByteDance Seed: Seed 1.6

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window.

Quality

$1.13

Blended/1M

—

tok/s

MiniMax: MiniMax M2.1

MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world...

Quality

$0.62

Blended/1M

—

tok/s

Xiaomi: MiMo-V2-Flash

MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters, adopting hybrid attention architecture. MiMo-V2-Flash supports a...

Quality

$0.20

Blended/1M

—

tok/s

Z.ai: GLM 4.6V

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts...

Quality

$0.60

Blended/1M

—

tok/s

GPT-5.1-Codex-Mini is a smaller and faster version of GPT-5.1-Codex

Quality

$1.13

Blended/1M

—

tok/s

Mistral: Voxtral Small 24B 2507

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio...

Quality

$0.20

Blended/1M

—

tok/s

OpenAI: gpt-oss-safeguard-20b

gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mixture-of-Experts (MoE) model offers lower latency for safety tasks like content classification, LLM filtering, and trust...

Quality

$0.19

Blended/1M

—

tok/s

MiniMax: MiniMax M2

MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning,...

Quality

$0.63

Blended/1M

—

tok/s

Baidu: ERNIE 4.5 21B A3B Thinking

ERNIE-4.5-21B-A3B-Thinking is Baidu's upgraded lightweight MoE model, refined to boost reasoning depth and quality for top-tier performance in logical puzzles, math, science, coding, text generation, and expert-level academic benchmarks.

Quality

$0.18

Blended/1M

—

tok/s

TheDrummer: Cydonia 24B V4.1

Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence.

Quality

$0.40

Blended/1M

—

tok/s

xAI: Grok 4 Fast

Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning. Read more about the model...

Quality

$0.35

Blended/1M

—

tok/s

Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.

Quality

$0.52

Blended/1M

—

tok/s

xAI: Grok Code Fast 1

Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality...

Quality

$0.85

Blended/1M

—

tok/s

Nous: Hermes 4 70B

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either...

Quality

$0.27

Blended/1M

—

tok/s

GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It...

Quality

$0.10

Blended/1M

—

tok/s

ByteDance: UI-TARS 7B

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

Quality

$0.15

Blended/1M

—

tok/s

Mistral: Devstral Small 1.1

Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and...

Quality

$0.20

Blended/1M

—

tok/s

Tencent: Hunyuan A13B Instruct

Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark...

Quality

$0.35

Blended/1M

—

tok/s

Baidu: ERNIE 4.5 300B A47B

ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It activates 47B parameters per token and supports text generation in...

Quality

$0.69

Blended/1M

—

tok/s

Inception: Mercury

Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude...

Quality

$0.50

Blended/1M

—

tok/s

Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.

Quality

$0.52

Blended/1M

—

tok/s

MiniMax: MiniMax-01

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context...

Quality

$0.65

Blended/1M

—

tok/s

Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4...

Quality

$0.21

Blended/1M

—

tok/s

Perceptron: Perceptron Mk1

Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model for video and embodied reasoning.** It accepts image and video inputs paired with natural language queries, and produces detailed visual understanding...

Quality

$0.82

Blended/1M

—

tok/s

Tencent: Hy3 preview

Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to...

Quality

$0.16

Blended/1M

—

tok/s

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

Quality

$0.09

Blended/1M

—

tok/s

Google: Gemma 3n 4B

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks...

Quality

$0.09

Blended/1M

—

tok/s

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Quality

$0.24

Blended/1M

—

tok/s

Sao10K: Llama 3 8B Lunaris

Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge....

Quality

$0.04

Blended/1M

—

tok/s

IBM: Granite 4.1 8B

Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window and is designed for enterprise tasks...

Quality

$0.08

Blended/1M

—

tok/s

inclusionAI: Ling-2.6-flash

Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, designed for real-world agents that require fast responses, strong execution, and high token efficiency....

Quality

$0.02

Blended/1M

—

tok/s