AI Model Directory

Detailed specs, pricing, and benchmarks for every major AI model.

OpenAI
New

GPT-5.5 Pro

High-compute variant of GPT-5.5 with extended thinking. 6x the price for mid-teens AAII uplift on hard reasoning workloads.

98

Quality

$105.00

Blended/1M

68

tok/s

Best for: Reasoning at any cost
OpenAI
New

GPT-5.5

OpenAI's "Spud" — the first fully retrained base model since GPT-4.5. 1M context window; tops the Artificial Analysis Intelligence Index at 59-60 and the LMArena text leaderboard at ~1506 Elo. Shipped April 23 2026, on AWS Bedrock April 28.

97

Quality

$17.50

Blended/1M

70

tok/s

Best for: Frontier general purpose
Anthropic

Claude Opus 4.7

Anthropic's April 16 2026 flagship. SWE-bench Verified 87.6%, SWE-bench Pro 64.3%, GPQA Diamond 94.2%. Sits in the LMArena top tier at ~1503 (1505 thinking). Same $5/$25 pricing as Opus 4.6.

96

Quality

$15.00

Blended/1M

68

tok/s

Best for: Coding & agentic workflows
Google

Gemini 3.1 Pro

Tied at the top of the LMArena text leaderboard May 2026 @ ~1505 Elo. GPQA Diamond 94.3%, MMLU-Pro 91.0% — current scientific reasoning leader. 1M token context; fastest frontier model at ~131 tok/s.

96

Quality

$7.00

Blended/1M

131

tok/s

Best for: Science & long-context
OpenAI

o3

OpenAI's most powerful reasoning model. State-of-the-art on MATH, coding, and science benchmarks.

94

Quality

$25.00

Blended/1M

68

tok/s

Best for: Hard reasoning
xAI
New

Grok 4.3

xAI's May 6 2026 flagship. 1M context, native video input, and a ~40% input price cut over Grok 4.20. Artificial Analysis Intelligence Index 53; outperforms Opus 4.7 ~1.26x on agentic Vending-Bench.

93

Quality

$1.88

Blended/1M

83

tok/s

Best for: Agentic tasks & real-time info
Google

Gemini 2.5 Pro

Google's thinking model with native tool use, 1M context window, and strong multimodal capabilities.

92

Quality

$5.63

Blended/1M

87

tok/s

Best for: Multimodal + value
Moonshot AI

Kimi K2.6

Moonshot AI's April 20 2026 frontier model. 256K context with text, image, and video input. Artificial Analysis Intelligence Index 54; SWE-bench Verified 80.2%, up sharply from K2.5.

92

Quality

$2.48

Blended/1M

48

tok/s

Best for: Frontier quality at low cost
Anthropic

Claude Opus 4

Anthropic's most capable model. Excels at complex analysis, nuanced writing, and extended agentic tasks.

91

Quality

$45.00

Blended/1M

52

tok/s

Best for: Complex analysis
DeepSeek
OSS

DeepSeek R1

DeepSeek's reasoning model. Competitive with o3 on math and coding at a fraction of the cost.

91

Quality

$1.37

Blended/1M

35

tok/s

Best for: Cheap reasoning
DeepSeek
NewOSS

DeepSeek V4 Pro

1.6T MoE / 49B active. Apache 2.0, 1M context. Best price-per-quality of any frontier-tier model in May 2026 by a wide margin — regular rate $1.74/$3.48, with a 75%-off launch promo ($0.435/$0.87) running through May 31 2026.

90

Quality

$2.61

Blended/1M

33

tok/s

Best for: Open-source value leader
Anthropic

Claude Sonnet 4.6

Anthropic's Feb 17 2026 balanced model. Near-Opus performance at Sonnet pricing — SWE-bench Verified 79.6%, OSWorld 72.5%, 1M context. Best price-to-performance in the Claude lineup.

90

Quality

$9.00

Blended/1M

73

tok/s

Best for: Coding & balance
OpenAI

GPT-4.1

OpenAI's latest flagship with 1M token context, improved instruction following and coding.

89

Quality

$5.00

Blended/1M

120

tok/s

Best for: Long context
OpenAI

o3 Mini

OpenAI's compact reasoning model with extended thinking capabilities for complex problem solving.

88

Quality

$2.75

Blended/1M

155

tok/s

Best for: Reasoning & math
Anthropic

Claude Sonnet 4

Anthropic's balanced model with excellent coding and reasoning. Best price-to-performance ratio.

88

Quality

$9.00

Blended/1M

95

tok/s

Best for: Coding & balance
Z.ai (Zhipu AI)
NewOSS

GLM-5.1

Z.ai's (Zhipu) 2026 flagship. 200K context, strong agentic and tool-use scores — 98% on τ²-Bench Telecom, on par with Grok 4.3. Artificial Analysis Intelligence Index 51.

88

Quality

$3.10

Blended/1M

48

tok/s

Best for: Open-weight agentic & tool use
xAI

Grok 3

xAI's flagship model with strong reasoning and real-time information access. Trained on the Colossus cluster.

87

Quality

$9.00

Blended/1M

82

tok/s

Best for: Real-time info
DeepSeek
OSS

DeepSeek V3

671B MoE model with 37B active parameters. Outstanding price-performance ratio and coding ability.

86

Quality

$0.69

Blended/1M

62

tok/s

Best for: Best open-source value
Alibaba Cloud

Qwen 3.6 Plus

Alibaba's April 2026 flagship. Long-context multilingual specialist with strong CN/EN/AR coverage and competitive coding scores.

86

Quality

$3.50

Blended/1M

124

tok/s

Best for: Multilingual & APAC
OpenAI

GPT-4o

OpenAI's flagship multimodal model with vision, code generation, and function calling. Excellent all-round performance.

85

Quality

$6.25

Blended/1M

109

tok/s

Best for: General purpose
Meta
OSS

Llama 4 Maverick

Meta's mixture-of-experts model with 17B active parameters and 128 experts. Strong multimodal and multilingual performance.

80

Quality

$0.40

Blended/1M

135

tok/s

Best for: Open-source value
Alibaba Cloud
OSS

Qwen 2.5 72B

Alibaba's flagship open-source model. Competitive with GPT-4o class models on benchmarks at a fraction of the cost.

80

Quality

$0.60

Blended/1M

85

tok/s

Best for: Open-source flagship
DeepSeek
NewOSS

DeepSeek V4 Flash

284B MoE / 13B active. Apache 2.0, 1M context. Among the cheapest frontier-adjacent models — $0.14 input, $0.28 output per 1M tokens.

80

Quality

$0.21

Blended/1M

105

tok/s

Best for: Cheap-and-fast cascade tier
Mistral AI

Mistral Large 2

Mistral's flagship 123B model with strong multilingual and coding performance. Supports 128K context.

79

Quality

$4.00

Blended/1M

78

tok/s

Best for: Multilingual
xAI

Grok 3 Mini

xAI's efficient reasoning model with thinking capabilities at a lower cost point.

78

Quality

$0.40

Blended/1M

165

tok/s

Best for: Budget reasoning
Perplexity

Sonar Pro

Perplexity's search-augmented model. Combines LLM reasoning with real-time web search and citations.

78

Quality

$9.00

Blended/1M

65

tok/s

Best for: Search + citations
Mistral AI

Codestral

Mistral's dedicated code model. Optimized for code generation, completion, and review across 80+ languages.

76

Quality

$0.60

Blended/1M

195

tok/s

Best for: Code generation
Mistral AI
OSS

Nemotron 3 Nano Omni

NVIDIA's 30B open multimodal model running vision + audio + text in a single stack. Tops 6 specialty leaderboards. April 2026.

76

Quality

$0.00

Blended/1M

158

tok/s

Best for: Open multimodal
Anthropic

Claude 3.5 Haiku

Anthropic's fastest model. Ultra-low latency for real-time applications and high-volume tasks.

75

Quality

$2.40

Blended/1M

172

tok/s

Best for: Speed & cost
Google
OSS

Gemma 4 27B

Google's open-weight flagship under Apache 2.0 — April 2026. Designed for self-host with strong instruction-following and tool calling.

75

Quality

$0.00

Blended/1M

142

tok/s

Best for: Self-hosted general purpose
Google

Gemini 2.0 Flash

Google's fastest model. Optimized for speed and efficiency with strong coding and reasoning.

74

Quality

$0.25

Blended/1M

244

tok/s

Best for: Fastest + cheapest
Alibaba Cloud
OSS

Qwen 2.5 Coder 32B

Specialized coding model from Alibaba. Top open-source code model on HumanEval and SWE-Bench.

74

Quality

$0.30

Blended/1M

125

tok/s

Best for: Open-source coding
OpenAI

GPT-4o Mini

Fast and affordable small model for lightweight tasks and high-throughput use cases.

72

Quality

$0.38

Blended/1M

183

tok/s

Best for: High throughput
Meta
OSS

Llama 4 Scout

Meta's efficient MoE model with 16 experts. 10M token context window and strong multilingual support.

71

Quality

$0.28

Blended/1M

198

tok/s

Best for: Longest context
Amazon

Amazon Nova Pro

Amazon's capable multimodal model. Strong balance of accuracy, speed, and cost for diverse tasks.

70

Quality

$2.00

Blended/1M

110

tok/s

Best for: AWS ecosystem
Cohere

Command R+

Cohere's flagship for enterprise RAG. Optimized for retrieval-augmented generation and tool use.

68

Quality

$6.25

Blended/1M

72

tok/s

Best for: Enterprise RAG