LMArena Leaderboard — May 2026
What the LMArena actually is, how to read an Arena Elo score, and the current top 10 for May 2026. The original human-preference benchmark that started as LMSys Chatbot Arena and now anchors most enterprise model selection conversations.
What is the LMArena, in one paragraph?
The LMArena is a public, blind side-by-side voting site for AI chat models. A user submits a prompt, two anonymous models reply, the user picks a winner, and the project aggregates millions of such votes into Elo ratings. It started in 2023 as the LMSys Chatbot Arena out of UC Berkeley and rebranded to LMArena.ai in 2024-25 as it spun out into an independent project. The current May 2026 top 10 is below — three models now sit above the historical 1500 Elo barrier on text, with the open-weights tier within striking distance of the closed-source frontier.
How to read an Arena Elo score
Reference table for Arena Elo bands (May 2026) 1500+ Frontier Gemini 3.1 Pro, Claude 4.7, GPT-5.5 Pro 1450 Frontier-adj. DeepSeek V4 Pro, Qwen 3.6 Plus 1400 Strong tier GPT-4.1, Claude Sonnet 4, Gemini 2.5 Pro 1300 Capable tier Llama 4 Maverick, Mistral Large 3 1200 Solid daily Gemma 4, Phi-4, Mistral Small 3 1100 Light tasks DeepSeek V4 Flash, GPT-4o Mini <1100 Legacy tier Older 2023-24 model generations A 100-Elo gap means the higher-rated model wins ~64% of head-to-heads. A 200-Elo gap means it wins ~76%. Rating shifts under 25 points are noise.
Live Leaderboard
| # | Model | Quality | Arena ELO | Speed | Price | Context | Value | Released |
|---|---|---|---|---|---|---|---|---|
| 1 | OpenAI · Hard reasoning | 96 | 1370 | 68 t/s | $10 / $40 | 200K | 3.8 | Apr 2025 |
| 2 | Anthropic · Complex analysis | 95 | 1360 | 52 t/s | $15 / $75 | 200K | 2.1 | May 2025 |
| 3 | GPT-5.5 Pro New OpenAI · Reasoning at any cost | 95 | 1502 | 92 t/s | $30 / $180 | 1M | 0.9 | Apr 2026 |
| 4 | Claude Opus 4.7 New Anthropic · Coding & agentic workflows | 93 | 1497 | 78 t/s | $5 / $25 | 1M | 6.2 | Apr 2026 |
| 5 | Google · Multimodal + value | 92 | 1345 | 87 t/s | $1.25 / $10 | 1M | 16.4 | Mar 2025 |
| 6 | GPT-5.5 New OpenAI · Frontier general purpose | 92 | 1481 | 138 t/s | $5 / $30 | 1M | 5.3 | Apr 2026 |
| 7 | DeepSeek R1OSS DeepSeek · Cheap reasoning | 91 | 1350 | 35 t/s | $0.55 / $2.19 | 128K | 66.4 | Jan 2025 |
| 8 | Gemini 3.1 Pro New Google · Science & long-context | 91 | 1500 | 165 t/s | $3.5 / $10.5 | 2M | 13.0 | Apr 2026 |
| 9 | OpenAI · Long context | 89 | 1310 | 120 t/s | $2 / $8 | 1M | 17.8 | Apr 2025 |
| 10 | OpenAI · Reasoning & math | 88 | 1305 | 155 t/s | $1.1 / $4.4 | 200K | 32.0 | Jan 2025 |
| 11 | Anthropic · Coding & balance | 88 | 1320 | 95 t/s | $3 / $15 | 200K | 9.8 | May 2025 |
| 12 | DeepSeek · Open-source value leader | 88 | 1462 | 112 t/s | $1.74 / $3.48 | 1M | 33.7 | Apr 2026 |
| 13 | xAI · Real-time info | 87 | 1330 | 82 t/s | $3 / $15 | 131K | 9.7 | Feb 2025 |
| 14 | DeepSeek V3OSS DeepSeek · Best open-source value | 86 | 1310 | 62 t/s | $0.27 / $1.1 | 128K | 125.5 | Mar 2025 |
| 15 | OpenAI · General purpose | 85 | 1285 | 109 t/s | $2.5 / $10 | 128K | 13.6 | May 2024 |
| 16 | Qwen 3.6 Plus New Alibaba Cloud · Multilingual & APAC | 84 | 1423 | 124 t/s | $1.4 / $5.6 | 256K | 24.0 | Apr 2026 |
| 17 | Meta · Open-source value | 80 | 1260 | 135 t/s | $0.2 / $0.6 | 1M | 200.0 | Apr 2025 |
| 18 | Qwen 2.5 72BOSS Alibaba Cloud · Open-source flagship | 80 | 1255 | 85 t/s | $0.3 / $0.9 | 131K | 133.3 | Sep 2024 |
| 19 | Mistral AI · Multilingual | 79 | 1250 | 78 t/s | $2 / $6 | 128K | 19.8 | Nov 2024 |
| 20 | xAI · Budget reasoning | 78 | 1275 | 165 t/s | $0.3 / $0.5 | 131K | 195.0 | Feb 2025 |
| 21 | Perplexity · Search + citations | 78 | — | 65 t/s | $3 / $15 | 200K | 8.7 | Feb 2025 |
| 22 | DeepSeek · Cheap-and-fast cascade tier | 78 | 1392 | 218 t/s | $0.14 / $0.28 | 1M | 371.4 | Apr 2026 |
| 23 | Mistral AI · Code generation | 76 | — | 195 t/s | $0.3 / $0.9 | 256K | 126.7 | Jan 2025 |
| 24 | Mistral AI · Open multimodal | 76 | 1361 | 158 t/s | Self-host | 256K | — | Apr 2026 |
| 25 | Anthropic · Speed & cost | 75 | 1230 | 172 t/s | $0.8 / $4 | 200K | 31.3 | Oct 2024 |
| 26 | Google · Self-hosted general purpose | 75 | 1351 | 142 t/s | Self-host | 128K | — | Apr 2026 |
| 27 | Google · Fastest + cheapest | 74 | 1240 | 244 t/s | $0.1 / $0.4 | 1M | 296.0 | Feb 2025 |
| 28 | Alibaba Cloud · Open-source coding | 74 | — | 125 t/s | $0.15 / $0.45 | 131K | 246.7 | Nov 2024 |
| 29 | OpenAI · High throughput | 72 | 1216 | 183 t/s | $0.15 / $0.6 | 128K | 192.0 | Jul 2024 |
| 30 | Meta · Longest context | 71 | 1195 | 198 t/s | $0.15 / $0.4 | 10M | 258.2 | Apr 2025 |
| 31 | Amazon · AWS ecosystem | 70 | — | 110 t/s | $0.8 / $3.2 | 300K | 35.0 | Dec 2024 |
| 32 | Cohere · Enterprise RAG | 68 | 1170 | 72 t/s | $2.5 / $10 | 128K | 10.9 | Aug 2024 |
What to do this quarter
- Treat Arena Elo as a triage filter, not a decision. Use it to drop the bottom half of your candidate list, then run a real eval on the remainder.
- Pick the right Arena board. Coding teams should read the coding Arena (Claude Opus 4.7 leads at 1567 Elo). Long-context teams should read the hard-prompts Arena. The aggregate text leaderboard is the wrong signal for many enterprise workloads.
- Discount short-conversation polish. The Arena rewards style. Models tuned for chat win at the margin against models tuned for accuracy. Build internal evals that reward what your business actually pays for.
- Watch the gap, not the ranking. Sub-25 Elo shifts are within statistical noise. Anything under 50 Elo between two candidates is a coin flip on most workloads.
- Plan for the four-way race. Gemini 3.1 Pro, Claude Opus 4.7, GPT-5.5 Pro, and DeepSeek V4 Pro are approximately interchangeable on quality at the top. Optimise your stack for switching cost, not for capability.
- Capture vote-rate momentum. The fastest-rising models week-over-week are usually the next month's leaders. Subscribe to weekly Arena reports.
- Pair Arena Elo with cost. A 50-Elo lead at 10x the price is rarely a good trade. See our model leaderboard for combined quality-cost rankings.
Related reading
- AI Model Leaderboard — full quality, speed, and pricing comparison
- LLM Leaderboard — same data, LLM-focused entry point
- LMSys Arena Leaderboard May 2026 — full deep-dive
- LMArena Elo Explained for Enterprise Buyers
For teams running multiple top-of-Arena models in production, Swfte Connect provides a single OpenAI-compatible endpoint that routes across providers and normalises Arena-tier quality without re-architecting your stack.