GenAI Leaderboard — May 2026
Large language models ranked by LMSys Arena Elo, MMLU, HumanEval, MATH, pricing, and inference speed. Refreshed monthly with live data from official provider pricing pages, Artificial Analysis, and the Arena.
Which GenAI model leads in May 2026?
Generative AI models split into three workload families: text generation (chat, summarization, drafting), code generation, and multimodal (vision, audio, video). No single model dominates all three. Our composite quality index ranks across the unified workload set, but per-family leaders matter more for production. The May 2026 leaderboard below shows all three signals in a single view.
| # | Model | Quality | Arena ELO | Speed | Price | Context | Value | Released |
|---|---|---|---|---|---|---|---|---|
| 1 | OpenAI · Hard reasoning | 96 | 1370 | 68 t/s | $10 / $40 | 200K | 3.8 | Apr 2025 |
| 2 | Anthropic · Complex analysis | 95 | 1360 | 52 t/s | $15 / $75 | 200K | 2.1 | May 2025 |
| 3 | GPT-5.5 Pro New OpenAI · Reasoning at any cost | 95 | 1502 | 92 t/s | $30 / $180 | 1M | 0.9 | Apr 2026 |
| 4 | Claude Opus 4.7 New Anthropic · Coding & agentic workflows | 93 | 1497 | 78 t/s | $5 / $25 | 1M | 6.2 | Apr 2026 |
| 5 | Google · Multimodal + value | 92 | 1345 | 87 t/s | $1.25 / $10 | 1M | 16.4 | Mar 2025 |
| 6 | GPT-5.5 New OpenAI · Frontier general purpose | 92 | 1481 | 138 t/s | $5 / $30 | 1M | 5.3 | Apr 2026 |
| 7 | DeepSeek R1OSS DeepSeek · Cheap reasoning | 91 | 1350 | 35 t/s | $0.55 / $2.19 | 128K | 66.4 | Jan 2025 |
| 8 | Gemini 3.1 Pro New Google · Science & long-context | 91 | 1500 | 165 t/s | $3.5 / $10.5 | 2M | 13.0 | Apr 2026 |
| 9 | OpenAI · Long context | 89 | 1310 | 120 t/s | $2 / $8 | 1M | 17.8 | Apr 2025 |
| 10 | OpenAI · Reasoning & math | 88 | 1305 | 155 t/s | $1.1 / $4.4 | 200K | 32.0 | Jan 2025 |
| 11 | Anthropic · Coding & balance | 88 | 1320 | 95 t/s | $3 / $15 | 200K | 9.8 | May 2025 |
| 12 | DeepSeek · Open-source value leader | 88 | 1462 | 112 t/s | $1.74 / $3.48 | 1M | 33.7 | Apr 2026 |
| 13 | xAI · Real-time info | 87 | 1330 | 82 t/s | $3 / $15 | 131K | 9.7 | Feb 2025 |
| 14 | DeepSeek V3OSS DeepSeek · Best open-source value | 86 | 1310 | 62 t/s | $0.27 / $1.1 | 128K | 125.5 | Mar 2025 |
| 15 | OpenAI · General purpose | 85 | 1285 | 109 t/s | $2.5 / $10 | 128K | 13.6 | May 2024 |
| 16 | Qwen 3.6 Plus New Alibaba Cloud · Multilingual & APAC | 84 | 1423 | 124 t/s | $1.4 / $5.6 | 256K | 24.0 | Apr 2026 |
| 17 | Meta · Open-source value | 80 | 1260 | 135 t/s | $0.2 / $0.6 | 1M | 200.0 | Apr 2025 |
| 18 | Qwen 2.5 72BOSS Alibaba Cloud · Open-source flagship | 80 | 1255 | 85 t/s | $0.3 / $0.9 | 131K | 133.3 | Sep 2024 |
| 19 | Mistral AI · Multilingual | 79 | 1250 | 78 t/s | $2 / $6 | 128K | 19.8 | Nov 2024 |
| 20 | xAI · Budget reasoning | 78 | 1275 | 165 t/s | $0.3 / $0.5 | 131K | 195.0 | Feb 2025 |
| 21 | Perplexity · Search + citations | 78 | — | 65 t/s | $3 / $15 | 200K | 8.7 | Feb 2025 |
| 22 | DeepSeek · Cheap-and-fast cascade tier | 78 | 1392 | 218 t/s | $0.14 / $0.28 | 1M | 371.4 | Apr 2026 |
| 23 | Mistral AI · Code generation | 76 | — | 195 t/s | $0.3 / $0.9 | 256K | 126.7 | Jan 2025 |
| 24 | Mistral AI · Open multimodal | 76 | 1361 | 158 t/s | Self-host | 256K | — | Apr 2026 |
| 25 | Anthropic · Speed & cost | 75 | 1230 | 172 t/s | $0.8 / $4 | 200K | 31.3 | Oct 2024 |
| 26 | Google · Self-hosted general purpose | 75 | 1351 | 142 t/s | Self-host | 128K | — | Apr 2026 |
| 27 | Google · Fastest + cheapest | 74 | 1240 | 244 t/s | $0.1 / $0.4 | 1M | 296.0 | Feb 2025 |
| 28 | Alibaba Cloud · Open-source coding | 74 | — | 125 t/s | $0.15 / $0.45 | 131K | 246.7 | Nov 2024 |
| 29 | OpenAI · High throughput | 72 | 1216 | 183 t/s | $0.15 / $0.6 | 128K | 192.0 | Jul 2024 |
| 30 | Meta · Longest context | 71 | 1195 | 198 t/s | $0.15 / $0.4 | 10M | 258.2 | Apr 2025 |
| 31 | Amazon · AWS ecosystem | 70 | — | 110 t/s | $0.8 / $3.2 | 300K | 35.0 | Dec 2024 |
| 32 | Cohere · Enterprise RAG | 68 | 1170 | 72 t/s | $2.5 / $10 | 128K | 10.9 | Aug 2024 |
How the LLM leaderboard works
We pull official provider pricing every 24 hours, Artificial Analysis benchmark snapshots weekly, and LMSys Arena Elo as it publishes. The composite quality index is a 0-100 normalization over MMLU Pro, HumanEval, and MATH, weighted by recency and cross-validated against Arena Elo. We do not accept vendor-supplied numbers without an independent reference.
Where the leaderboard is wrong
No leaderboard predicts your production accuracy. LMSys Arena rewards style and short-conversation polish; a top-Arena model can still under-perform on your specific function-calling schema or long-context retrieval workload. Build an internal eval harness before you commit. See our LMArena Elo explained and LLM routing writeups for the deep-dive.
Related rankings
- AI Model Leaderboard — same data, broader entry point
- Models Leaderboard
- GenAI Leaderboard
- AI Vendor Lock-in Leaderboard