AI Model Leaderboard
Every major AI model ranked by quality, speed, pricing, and value. Filter by category, sort by any metric, and find the right model for your use case.
Gold
o3
96
Quality Index
Silver
Claude Opus 4
95
Quality Index
Bronze
Gemini 2.5 Pro
92
Quality Index
| # | Model | Quality | Arena ELO | Speed | Price | Context | Value | Released |
|---|---|---|---|---|---|---|---|---|
| 1 | OpenAI · Hard reasoning | 96 | 1370 | 68 t/s | $10 / $40 | 200K | 3.8 | Apr 2025 |
| 2 | Anthropic · Complex analysis | 95 | 1360 | 52 t/s | $15 / $75 | 200K | 2.1 | May 2025 |
| 3 | Google · Multimodal + value | 92 | 1345 | 87 t/s | $1.25 / $10 | 1M | 16.4 | Mar 2025 |
| 4 | DeepSeek R1OSS DeepSeek · Cheap reasoning | 91 | 1350 | 35 t/s | $0.55 / $2.19 | 128K | 66.4 | Jan 2025 |
| 5 | OpenAI · Long context | 89 | 1310 | 120 t/s | $2 / $8 | 1M | 17.8 | Apr 2025 |
| 6 | OpenAI · Reasoning & math | 88 | 1305 | 155 t/s | $1.1 / $4.4 | 200K | 32.0 | Jan 2025 |
| 7 | Anthropic · Coding & balance | 88 | 1320 | 95 t/s | $3 / $15 | 200K | 9.8 | May 2025 |
| 8 | xAI · Real-time info | 87 | 1330 | 82 t/s | $3 / $15 | 131K | 9.7 | Feb 2025 |
| 9 | DeepSeek V3OSS DeepSeek · Best open-source value | 86 | 1310 | 62 t/s | $0.27 / $1.1 | 128K | 125.5 | Mar 2025 |
| 10 | OpenAI · General purpose | 85 | 1285 | 109 t/s | $2.5 / $10 | 128K | 13.6 | May 2024 |
| 11 | Meta · Open-source value | 80 | 1260 | 135 t/s | $0.2 / $0.6 | 1M | 200.0 | Apr 2025 |
| 12 | Qwen 2.5 72BOSS Alibaba Cloud · Open-source flagship | 80 | 1255 | 85 t/s | $0.3 / $0.9 | 131K | 133.3 | Sep 2024 |
| 13 | Mistral AI · Multilingual | 79 | 1250 | 78 t/s | $2 / $6 | 128K | 19.8 | Nov 2024 |
| 14 | xAI · Budget reasoning | 78 | 1275 | 165 t/s | $0.3 / $0.5 | 131K | 195.0 | Feb 2025 |
| 15 | Perplexity · Search + citations | 78 | — | 65 t/s | $3 / $15 | 200K | 8.7 | Feb 2025 |
| 16 | Mistral AI · Code generation | 76 | — | 195 t/s | $0.3 / $0.9 | 256K | 126.7 | Jan 2025 |
| 17 | Anthropic · Speed & cost | 75 | 1230 | 172 t/s | $0.8 / $4 | 200K | 31.3 | Oct 2024 |
| 18 | Google · Fastest + cheapest | 74 | 1240 | 244 t/s | $0.1 / $0.4 | 1M | 296.0 | Feb 2025 |
| 19 | Alibaba Cloud · Open-source coding | 74 | — | 125 t/s | $0.15 / $0.45 | 131K | 246.7 | Nov 2024 |
| 20 | OpenAI · High throughput | 72 | 1216 | 183 t/s | $0.15 / $0.6 | 128K | 192.0 | Jul 2024 |
| 21 | Meta · Longest context | 71 | 1195 | 198 t/s | $0.15 / $0.4 | 10M | 258.2 | Apr 2025 |
| 22 | Amazon · AWS ecosystem | 70 | — | 110 t/s | $0.8 / $3.2 | 300K | 35.0 | Dec 2024 |
| 23 | Cohere · Enterprise RAG | 68 | 1170 | 72 t/s | $2.5 / $10 | 128K | 10.9 | Aug 2024 |
Try Any Model Instantly
Compare models side-by-side in our AI Playground. Send the same prompt to two models and see the difference in quality and speed.
How We Rank AI Models
Our leaderboard uses a composite quality index that combines three key benchmarks: MMLU Pro (measuring knowledge and reasoning across 57 subjects), HumanEval (measuring code generation ability), and MATH (measuring mathematical problem-solving). Scores are normalized to a 0-100 scale and cross-referenced against LMSYS Chatbot Arena ELO ratings for real-world validation.
We track speed (tokens per second), time-to-first-token (TTFT), pricing, and context window size to give you a complete picture. The Value Score divides quality by cost, showing you which models deliver the most capability per dollar.
Key Trends in AI Model Performance
- Open-source catching up: DeepSeek R1 and V3 now compete with top closed-source models on reasoning and coding benchmarks
- Reasoning specialization: Models like o3 and R1 trade speed for dramatically better performance on complex tasks
- Context windows expanding: 1M+ tokens is now standard for flagship models, with Llama 4 Scout supporting 10M
- Speed improving: Flash-tier models now exceed 200 tokens/second while maintaining strong quality
Choosing the Right Model
There is no single "best" model — it depends on your use case. For most applications, a model routing approach works best: route simple queries to fast, cheap models and complex queries to frontier models. This gives you the best of both worlds — low cost and high quality.