AI Model Directory
Detailed specs, pricing, and benchmarks for every major AI model.
GPT-5.5 Pro
High-compute variant of GPT-5.5 with extended thinking. 6x the price for mid-teens AAII uplift on hard reasoning workloads.
98
Quality
$105.00
Blended/1M
68
tok/s
GPT-5.5
OpenAI's "Spud" — the first fully retrained base model since GPT-4.5. 1M context window; tops the Artificial Analysis Intelligence Index at 59-60 and the LMArena text leaderboard at ~1506 Elo. Shipped April 23 2026, on AWS Bedrock April 28.
97
Quality
$17.50
Blended/1M
70
tok/s
Claude Opus 4.7
Anthropic's April 16 2026 flagship. SWE-bench Verified 87.6%, SWE-bench Pro 64.3%, GPQA Diamond 94.2%. Sits in the LMArena top tier at ~1503 (1505 thinking). Same $5/$25 pricing as Opus 4.6.
96
Quality
$15.00
Blended/1M
68
tok/s
Gemini 3.1 Pro
Tied at the top of the LMArena text leaderboard May 2026 @ ~1505 Elo. GPQA Diamond 94.3%, MMLU-Pro 91.0% — current scientific reasoning leader. 1M token context; fastest frontier model at ~131 tok/s.
96
Quality
$7.00
Blended/1M
131
tok/s
o3
OpenAI's most powerful reasoning model. State-of-the-art on MATH, coding, and science benchmarks.
94
Quality
$25.00
Blended/1M
68
tok/s
Grok 4.3
xAI's May 6 2026 flagship. 1M context, native video input, and a ~40% input price cut over Grok 4.20. Artificial Analysis Intelligence Index 53; outperforms Opus 4.7 ~1.26x on agentic Vending-Bench.
93
Quality
$1.88
Blended/1M
83
tok/s
Gemini 2.5 Pro
Google's thinking model with native tool use, 1M context window, and strong multimodal capabilities.
92
Quality
$5.63
Blended/1M
87
tok/s
Kimi K2.6
Moonshot AI's April 20 2026 frontier model. 256K context with text, image, and video input. Artificial Analysis Intelligence Index 54; SWE-bench Verified 80.2%, up sharply from K2.5.
92
Quality
$2.48
Blended/1M
48
tok/s
Claude Opus 4
Anthropic's most capable model. Excels at complex analysis, nuanced writing, and extended agentic tasks.
91
Quality
$45.00
Blended/1M
52
tok/s
DeepSeek R1
DeepSeek's reasoning model. Competitive with o3 on math and coding at a fraction of the cost.
91
Quality
$1.37
Blended/1M
35
tok/s
DeepSeek V4 Pro
1.6T MoE / 49B active. Apache 2.0, 1M context. Best price-per-quality of any frontier-tier model in May 2026 by a wide margin — regular rate $1.74/$3.48, with a 75%-off launch promo ($0.435/$0.87) running through May 31 2026.
90
Quality
$2.61
Blended/1M
33
tok/s
Claude Sonnet 4.6
Anthropic's Feb 17 2026 balanced model. Near-Opus performance at Sonnet pricing — SWE-bench Verified 79.6%, OSWorld 72.5%, 1M context. Best price-to-performance in the Claude lineup.
90
Quality
$9.00
Blended/1M
73
tok/s
GPT-4.1
OpenAI's latest flagship with 1M token context, improved instruction following and coding.
89
Quality
$5.00
Blended/1M
120
tok/s
o3 Mini
OpenAI's compact reasoning model with extended thinking capabilities for complex problem solving.
88
Quality
$2.75
Blended/1M
155
tok/s
Claude Sonnet 4
Anthropic's balanced model with excellent coding and reasoning. Best price-to-performance ratio.
88
Quality
$9.00
Blended/1M
95
tok/s
GLM-5.1
Z.ai's (Zhipu) 2026 flagship. 200K context, strong agentic and tool-use scores — 98% on τ²-Bench Telecom, on par with Grok 4.3. Artificial Analysis Intelligence Index 51.
88
Quality
$3.10
Blended/1M
48
tok/s
Grok 3
xAI's flagship model with strong reasoning and real-time information access. Trained on the Colossus cluster.
87
Quality
$9.00
Blended/1M
82
tok/s
DeepSeek V3
671B MoE model with 37B active parameters. Outstanding price-performance ratio and coding ability.
86
Quality
$0.69
Blended/1M
62
tok/s
Qwen 3.6 Plus
Alibaba's April 2026 flagship. Long-context multilingual specialist with strong CN/EN/AR coverage and competitive coding scores.
86
Quality
$3.50
Blended/1M
124
tok/s
GPT-4o
OpenAI's flagship multimodal model with vision, code generation, and function calling. Excellent all-round performance.
85
Quality
$6.25
Blended/1M
109
tok/s
Llama 4 Maverick
Meta's mixture-of-experts model with 17B active parameters and 128 experts. Strong multimodal and multilingual performance.
80
Quality
$0.40
Blended/1M
135
tok/s
Qwen 2.5 72B
Alibaba's flagship open-source model. Competitive with GPT-4o class models on benchmarks at a fraction of the cost.
80
Quality
$0.60
Blended/1M
85
tok/s
DeepSeek V4 Flash
284B MoE / 13B active. Apache 2.0, 1M context. Among the cheapest frontier-adjacent models — $0.14 input, $0.28 output per 1M tokens.
80
Quality
$0.21
Blended/1M
105
tok/s
Mistral Large 2
Mistral's flagship 123B model with strong multilingual and coding performance. Supports 128K context.
79
Quality
$4.00
Blended/1M
78
tok/s
Grok 3 Mini
xAI's efficient reasoning model with thinking capabilities at a lower cost point.
78
Quality
$0.40
Blended/1M
165
tok/s
Sonar Pro
Perplexity's search-augmented model. Combines LLM reasoning with real-time web search and citations.
78
Quality
$9.00
Blended/1M
65
tok/s
Codestral
Mistral's dedicated code model. Optimized for code generation, completion, and review across 80+ languages.
76
Quality
$0.60
Blended/1M
195
tok/s
Nemotron 3 Nano Omni
NVIDIA's 30B open multimodal model running vision + audio + text in a single stack. Tops 6 specialty leaderboards. April 2026.
76
Quality
$0.00
Blended/1M
158
tok/s
Claude 3.5 Haiku
Anthropic's fastest model. Ultra-low latency for real-time applications and high-volume tasks.
75
Quality
$2.40
Blended/1M
172
tok/s
Gemma 4 27B
Google's open-weight flagship under Apache 2.0 — April 2026. Designed for self-host with strong instruction-following and tool calling.
75
Quality
$0.00
Blended/1M
142
tok/s
Gemini 2.0 Flash
Google's fastest model. Optimized for speed and efficiency with strong coding and reasoning.
74
Quality
$0.25
Blended/1M
244
tok/s
Qwen 2.5 Coder 32B
Specialized coding model from Alibaba. Top open-source code model on HumanEval and SWE-Bench.
74
Quality
$0.30
Blended/1M
125
tok/s
GPT-4o Mini
Fast and affordable small model for lightweight tasks and high-throughput use cases.
72
Quality
$0.38
Blended/1M
183
tok/s
Llama 4 Scout
Meta's efficient MoE model with 16 experts. 10M token context window and strong multilingual support.
71
Quality
$0.28
Blended/1M
198
tok/s
Amazon Nova Pro
Amazon's capable multimodal model. Strong balance of accuracy, speed, and cost for diverse tasks.
70
Quality
$2.00
Blended/1M
110
tok/s
Command R+
Cohere's flagship for enterprise RAG. Optimized for retrieval-augmented generation and tool use.
68
Quality
$6.25
Blended/1M
72
tok/s