Updated Apr 6, 2026

AI Model Leaderboard

Every major AI model ranked by quality, speed, pricing, and value. Filter by category, sort by any metric, and find the right model for your use case.

Gold

o3

96

Quality Index

Silver

Claude Opus 4

95

Quality Index

Bronze

Gemini 2.5 Pro

92

Quality Index

23 models
#ModelQualityArena ELOSpeedPriceContextValueReleased
1

OpenAI · Hard reasoning

96
137068 t/s$10 / $40200K3.8Apr 2025
2

Anthropic · Complex analysis

95
136052 t/s$15 / $75200K2.1May 2025
3

Google · Multimodal + value

92
134587 t/s$1.25 / $101M16.4Mar 2025
4

DeepSeek · Cheap reasoning

91
135035 t/s$0.55 / $2.19128K66.4Jan 2025
5

OpenAI · Long context

89
1310120 t/s$2 / $81M17.8Apr 2025
6

OpenAI · Reasoning & math

88
1305155 t/s$1.1 / $4.4200K32.0Jan 2025
7

Anthropic · Coding & balance

88
132095 t/s$3 / $15200K9.8May 2025
8

xAI · Real-time info

87
133082 t/s$3 / $15131K9.7Feb 2025
9

DeepSeek · Best open-source value

86
131062 t/s$0.27 / $1.1128K125.5Mar 2025
10

OpenAI · General purpose

85
1285109 t/s$2.5 / $10128K13.6May 2024
11

Meta · Open-source value

80
1260135 t/s$0.2 / $0.61M200.0Apr 2025
12

Alibaba Cloud · Open-source flagship

80
125585 t/s$0.3 / $0.9131K133.3Sep 2024
13

Mistral AI · Multilingual

79
125078 t/s$2 / $6128K19.8Nov 2024
14

xAI · Budget reasoning

78
1275165 t/s$0.3 / $0.5131K195.0Feb 2025
15

Perplexity · Search + citations

78
65 t/s$3 / $15200K8.7Feb 2025
16

Mistral AI · Code generation

76
195 t/s$0.3 / $0.9256K126.7Jan 2025
17

Anthropic · Speed & cost

75
1230172 t/s$0.8 / $4200K31.3Oct 2024
18

Google · Fastest + cheapest

74
1240244 t/s$0.1 / $0.41M296.0Feb 2025
19

Alibaba Cloud · Open-source coding

74
125 t/s$0.15 / $0.45131K246.7Nov 2024
20

OpenAI · High throughput

72
1216183 t/s$0.15 / $0.6128K192.0Jul 2024
21

Meta · Longest context

71
1195198 t/s$0.15 / $0.410M258.2Apr 2025
22

Amazon · AWS ecosystem

70
110 t/s$0.8 / $3.2300K35.0Dec 2024
23

Cohere · Enterprise RAG

68
117072 t/s$2.5 / $10128K10.9Aug 2024
Quality = composite benchmark (MMLU, HumanEval, MATH)Arena ELO = LMSYS Chatbot Arena ratingValue = quality per dollarPrice = input / output per 1M tokens

Try Any Model Instantly

Compare models side-by-side in our AI Playground. Send the same prompt to two models and see the difference in quality and speed.

How We Rank AI Models

Our leaderboard uses a composite quality index that combines three key benchmarks: MMLU Pro (measuring knowledge and reasoning across 57 subjects), HumanEval (measuring code generation ability), and MATH (measuring mathematical problem-solving). Scores are normalized to a 0-100 scale and cross-referenced against LMSYS Chatbot Arena ELO ratings for real-world validation.

We track speed (tokens per second), time-to-first-token (TTFT), pricing, and context window size to give you a complete picture. The Value Score divides quality by cost, showing you which models deliver the most capability per dollar.

Key Trends in AI Model Performance

  • Open-source catching up: DeepSeek R1 and V3 now compete with top closed-source models on reasoning and coding benchmarks
  • Reasoning specialization: Models like o3 and R1 trade speed for dramatically better performance on complex tasks
  • Context windows expanding: 1M+ tokens is now standard for flagship models, with Llama 4 Scout supporting 10M
  • Speed improving: Flash-tier models now exceed 200 tokens/second while maintaining strong quality

Choosing the Right Model

There is no single "best" model — it depends on your use case. For most applications, a model routing approach works best: route simple queries to fast, cheap models and complex queries to frontier models. This gives you the best of both worlds — low cost and high quality.