AI Model Leaderboard — May 2026
Every major AI model ranked by quality, speed, pricing, and value. Filter by category, sort by any metric, and find the right model for your use case. Live data refreshed monthly with LMSys Arena Elo, official provider pricing, and Artificial Analysis benchmarks.
Gold
GPT-5.5 Pro
98
Quality Index
Silver
GPT-5.5
97
Quality Index
Bronze
Claude Opus 4.7
96
Quality Index
Stop reading — start ranking
Three ways to put this leaderboard to work. Pick any one — they all start with a free Swfte account, no card required.
Run GPT-5.5 Pro free
The model topping this page is in your hands in 30 seconds. No card, no trial timer — sign in and prompt.
Start freeGet pinged on rank changes
Email the moment a model takes #1, drops below your price ceiling, or beats a benchmark you care about. One-click subscribe.
Set alertsThe Model-Hopper Challenge
Run the same prompt across 3+ models in the Swfte Playground. Spot something surprising — a sleeper win, a 10× price gap, a weird failure. Best entry each month: 50% off for 6 months.
Submit a findingOne winner picked monthly · discount applies to your first paid plan · see challenge rules
May 2026: Top Models, Best Value, Fastest Inference
The May 2026 ranking covers 36 models across LMSys Arena Elo, MMLU Pro, HumanEval, MATH, pricing, and inference speed. Top of the table: GPT-5.5 Pro at 98/100 quality. The full table below is sortable by any metric. Live data is refreshed hourly from official provider pricing pages and the public Arena.
Top 5 by Quality Index
- GPT-5.5 Pro — 98/100
- GPT-5.5 — 97/100
- Claude Opus 4.7 — 96/100
- Gemini 3.1 Pro — 96/100
- o3 — 94/100
Best Price-to-Quality
- DeepSeek V4 Flash — $0.28/1M out
- Gemini 2.0 Flash — $0.4/1M out
- Llama 4 Scout — $0.4/1M out
- Qwen 2.5 Coder 32B — $0.45/1M out
- Grok 3 Mini — $0.5/1M out
See our LMSys Arena deep dive and the monthly release roundup.
| # | Model | Quality | Arena ELO | Speed | Price | Context | Value | Released |
|---|---|---|---|---|---|---|---|---|
| 1 | GPT-5.5 Pro New OpenAI · Reasoning at any cost | 98 | 1510 | 68 t/s | $30 / $180 | 1M | 0.9 | Apr 2026 |
| 2 | GPT-5.5 New OpenAI · Frontier general purpose | 97 | 1506 | 70 t/s | $5 / $30 | 1M | 5.5 | Apr 2026 |
| 3 | Anthropic · Coding & agentic workflows | 96 | 1505 | 68 t/s | $5 / $25 | 1M | 6.4 | Apr 2026 |
| 4 | Google · Science & long-context | 96 | 1505 | 131 t/s | $2 / $12 | 1M | 13.7 | Apr 2026 |
| 5 | OpenAI · Hard reasoning | 94 | 1370 | 68 t/s | $10 / $40 | 200K | 3.8 | Apr 2025 |
| 6 | Grok 4.3 New xAI · Agentic tasks & real-time info | 93 | 1496 | 83 t/s | $1.25 / $2.5 | 1M | 49.6 | May 2026 |
| 7 | Google · Multimodal + value | 92 | 1345 | 87 t/s | $1.25 / $10 | 1M | 16.4 | Mar 2025 |
| 8 | Moonshot AI · Frontier quality at low cost | 92 | 1466 | 48 t/s | $0.95 / $4 | 256K | 37.2 | Apr 2026 |
| 9 | Anthropic · Complex analysis | 91 | 1360 | 52 t/s | $15 / $75 | 200K | 2.0 | May 2025 |
| 10 | DeepSeek R1OSS DeepSeek · Cheap reasoning | 91 | 1350 | 35 t/s | $0.55 / $2.19 | 128K | 66.4 | Jan 2025 |
| 11 | DeepSeek · Open-source value leader | 90 | 1467 | 33 t/s | $1.74 / $3.48 | 1M | 34.5 | Apr 2026 |
| 12 | Anthropic · Coding & balance | 90 | 1467 | 73 t/s | $3 / $15 | 1M | 10.0 | Feb 2026 |
| 13 | OpenAI · Long context | 89 | 1310 | 120 t/s | $2 / $8 | 1M | 17.8 | Apr 2025 |
| 14 | OpenAI · Reasoning & math | 88 | 1305 | 155 t/s | $1.1 / $4.4 | 200K | 32.0 | Jan 2025 |
| 15 | Anthropic · Coding & balance | 88 | 1320 | 95 t/s | $3 / $15 | 200K | 9.8 | May 2025 |
| 16 | Z.ai (Zhipu AI) · Open-weight agentic & tool use | 88 | 1467 | 48 t/s | $1.55 / $4.65 | 200K | 28.4 | Apr 2026 |
| 17 | xAI · Real-time info | 87 | 1330 | 82 t/s | $3 / $15 | 131K | 9.7 | Feb 2025 |
| 18 | DeepSeek V3OSS DeepSeek · Best open-source value | 86 | 1310 | 62 t/s | $0.27 / $1.1 | 128K | 125.5 | Mar 2025 |
| 19 | Alibaba Cloud · Multilingual & APAC | 86 | 1448 | 124 t/s | $1.4 / $5.6 | 256K | 24.6 | Apr 2026 |
| 20 | OpenAI · General purpose | 85 | 1285 | 109 t/s | $2.5 / $10 | 128K | 13.6 | May 2024 |
| 21 | Meta · Open-source value | 80 | 1260 | 135 t/s | $0.2 / $0.6 | 1M | 200.0 | Apr 2025 |
| 22 | Qwen 2.5 72BOSS Alibaba Cloud · Open-source flagship | 80 | 1255 | 85 t/s | $0.3 / $0.9 | 131K | 133.3 | Sep 2024 |
| 23 | DeepSeek · Cheap-and-fast cascade tier | 80 | 1410 | 105 t/s | $0.14 / $0.28 | 1M | 381.0 | Apr 2026 |
| 24 | Mistral AI · Multilingual | 79 | 1250 | 78 t/s | $2 / $6 | 128K | 19.8 | Nov 2024 |
| 25 | xAI · Budget reasoning | 78 | 1275 | 165 t/s | $0.3 / $0.5 | 131K | 195.0 | Feb 2025 |
| 26 | Perplexity · Search + citations | 78 | — | 65 t/s | $3 / $15 | 200K | 8.7 | Feb 2025 |
| 27 | Mistral AI · Code generation | 76 | — | 195 t/s | $0.3 / $0.9 | 256K | 126.7 | Jan 2025 |
| 28 | Mistral AI · Open multimodal | 76 | 1361 | 158 t/s | Self-host | 256K | — | Apr 2026 |
| 29 | Anthropic · Speed & cost | 75 | 1230 | 172 t/s | $0.8 / $4 | 200K | 31.3 | Oct 2024 |
| 30 | Gemma 4 27BOSS Google · Self-hosted general purpose | 75 | 1351 | 142 t/s | Self-host | 128K | — | Apr 2026 |
| 31 | Google · Fastest + cheapest | 74 | 1240 | 244 t/s | $0.1 / $0.4 | 1M | 296.0 | Feb 2025 |
| 32 | Alibaba Cloud · Open-source coding | 74 | — | 125 t/s | $0.15 / $0.45 | 131K | 246.7 | Nov 2024 |
| 33 | OpenAI · High throughput | 72 | 1216 | 183 t/s | $0.15 / $0.6 | 128K | 192.0 | Jul 2024 |
| 34 | Meta · Longest context | 71 | 1195 | 198 t/s | $0.15 / $0.4 | 10M | 258.2 | Apr 2025 |
| 35 | Amazon · AWS ecosystem | 70 | — | 110 t/s | $0.8 / $3.2 | 300K | 35.0 | Dec 2024 |
| 36 | Cohere · Enterprise RAG | 68 | 1170 | 72 t/s | $2.5 / $10 | 128K | 10.9 | Aug 2024 |
LLM Leaderboard May 2026
Large language models ranked by LMSys Arena Elo, MMLU, HumanEval, MATH, pricing, and tokens-per-second. Text-only view.
LM Leaderboard May 2026
Language model rankings: LMArena Elo, price-to-Elo ratio, and open-weight vs closed-source comparison.
LMSys Arena Leaderboard May 2026
LMArena (formerly LMSys Chatbot Arena) tracker — pairwise human preference Elo scores, refreshed as the public arena publishes.
Image Model Leaderboard 2026
Generative AI image and video models — Imagen 4, Flux 2, DALL-E 4, Stable Diffusion 4 Ultra, Sora 2 ranked by quality and cost.
Coding Model Leaderboard 2026
AI coding assistants ranked: Claude Opus, GPT-5.5, Gemini 3.1 Pro, DeepSeek V4, plus HumanEval and SWE-Bench scores.
Vendor Lock-in Leaderboard 2026
AI vendors ranked by portability — license, weight availability, fine-tuning openness, and exit cost score.
How We Rank AI Models
Our leaderboard uses a composite quality index that combines three key benchmarks: MMLU Pro (measuring knowledge and reasoning across 57 subjects), HumanEval (measuring code generation ability), and MATH (measuring mathematical problem-solving). Scores are normalized to a 0-100 scale and cross-referenced against LMSYS Chatbot Arena ELO ratings for real-world validation.
We track speed (tokens per second), time-to-first-token (TTFT), pricing, and context window size to give you a complete picture. The Value Score divides quality by cost, showing you which models deliver the most capability per dollar.
Key Trends in AI Model Performance
- Open-source catching up: DeepSeek R1 and V3 now compete with top closed-source models on reasoning and coding benchmarks
- Reasoning specialization: Models like o3 and R1 trade speed for dramatically better performance on complex tasks
- Context windows expanding: 1M+ tokens is now standard for flagship models, with Llama 4 Scout supporting 10M
- Speed improving: Flash-tier models now exceed 200 tokens/second while maintaining strong quality
Choosing the Right Model
There is no single "best" model — it depends on your use case. For most applications, a model routing approach works best: route simple queries to fast, cheap models and complex queries to frontier models. This gives you the best of both worlds — low cost and high quality.