LMArena.ai — Top Models May 2026
LMArena.ai is the rebranded LMSys Chatbot Arena. Same blind pairwise-voting methodology, same Elo math, new home. Here is who leads each board this month and what the rebrand actually changed for buyers.
The LMSys to LMArena.ai story
The Chatbot Arena began in 2023 as a research project under LMSys, an academic group out of UC Berkeley. It quickly became the most-cited LLM benchmark because it measured something the capability-only benchmarks could not: actual human preference under blind side-by-side comparison. By 2024 the project had processed millions of votes, become a procurement input for Fortune 500 buyers, and outgrown its original academic scaffold. The 2024-25 transition to the lmarena.ai domain consolidated the project as an independent organisation while keeping the same Elo methodology and open vote pool.
For users the rebrand changed almost nothing: same prompts, same blind voting, same Elo math. For procurement teams the rebrand codified Arena Elo as a vendor-neutral signal independent of any single university. That is what made it sticky as a reference point in enterprise contracts.
The four-way race at the top
The May 2026 snapshot below shows three models above the historical 1500 Elo barrier on text. The top of LMArena.ai is now genuinely contested.
Top of LMArena.ai text leaderboard (May 2026) Gemini 3.1 Pro Preview 1500 ████████████████████ text leader Claude Opus 4.7 Thinking 1495 ███████████████████ coding #1 GPT-5.5 Pro 1488 ██████████████████ reasoning DeepSeek V4 Pro 1462 █████████████████ Apache 2.0 Qwen 3.6 Plus 1423 ███████████████ open weights Claude Sonnet 4 1402 ██████████████ workhorse tier GPT-4.1 1395 █████████████ legacy frontier Gemini 2.5 Pro 1388 █████████████ legacy frontier Llama 4 Maverick 1352 ███████████ open weights Mistral Large 3 1341 ███████████ open weights
Full Leaderboard
| # | Model | Quality | Arena ELO | Speed | Price | Context | Value | Released |
|---|---|---|---|---|---|---|---|---|
| 1 | OpenAI · Hard reasoning | 96 | 1370 | 68 t/s | $10 / $40 | 200K | 3.8 | Apr 2025 |
| 2 | Anthropic · Complex analysis | 95 | 1360 | 52 t/s | $15 / $75 | 200K | 2.1 | May 2025 |
| 3 | GPT-5.5 Pro New OpenAI · Reasoning at any cost | 95 | 1502 | 92 t/s | $30 / $180 | 1M | 0.9 | Apr 2026 |
| 4 | Claude Opus 4.7 New Anthropic · Coding & agentic workflows | 93 | 1497 | 78 t/s | $5 / $25 | 1M | 6.2 | Apr 2026 |
| 5 | Google · Multimodal + value | 92 | 1345 | 87 t/s | $1.25 / $10 | 1M | 16.4 | Mar 2025 |
| 6 | GPT-5.5 New OpenAI · Frontier general purpose | 92 | 1481 | 138 t/s | $5 / $30 | 1M | 5.3 | Apr 2026 |
| 7 | DeepSeek R1OSS DeepSeek · Cheap reasoning | 91 | 1350 | 35 t/s | $0.55 / $2.19 | 128K | 66.4 | Jan 2025 |
| 8 | Gemini 3.1 Pro New Google · Science & long-context | 91 | 1500 | 165 t/s | $3.5 / $10.5 | 2M | 13.0 | Apr 2026 |
| 9 | OpenAI · Long context | 89 | 1310 | 120 t/s | $2 / $8 | 1M | 17.8 | Apr 2025 |
| 10 | OpenAI · Reasoning & math | 88 | 1305 | 155 t/s | $1.1 / $4.4 | 200K | 32.0 | Jan 2025 |
| 11 | Anthropic · Coding & balance | 88 | 1320 | 95 t/s | $3 / $15 | 200K | 9.8 | May 2025 |
| 12 | DeepSeek · Open-source value leader | 88 | 1462 | 112 t/s | $1.74 / $3.48 | 1M | 33.7 | Apr 2026 |
| 13 | xAI · Real-time info | 87 | 1330 | 82 t/s | $3 / $15 | 131K | 9.7 | Feb 2025 |
| 14 | DeepSeek V3OSS DeepSeek · Best open-source value | 86 | 1310 | 62 t/s | $0.27 / $1.1 | 128K | 125.5 | Mar 2025 |
| 15 | OpenAI · General purpose | 85 | 1285 | 109 t/s | $2.5 / $10 | 128K | 13.6 | May 2024 |
| 16 | Qwen 3.6 Plus New Alibaba Cloud · Multilingual & APAC | 84 | 1423 | 124 t/s | $1.4 / $5.6 | 256K | 24.0 | Apr 2026 |
| 17 | Meta · Open-source value | 80 | 1260 | 135 t/s | $0.2 / $0.6 | 1M | 200.0 | Apr 2025 |
| 18 | Qwen 2.5 72BOSS Alibaba Cloud · Open-source flagship | 80 | 1255 | 85 t/s | $0.3 / $0.9 | 131K | 133.3 | Sep 2024 |
| 19 | Mistral AI · Multilingual | 79 | 1250 | 78 t/s | $2 / $6 | 128K | 19.8 | Nov 2024 |
| 20 | xAI · Budget reasoning | 78 | 1275 | 165 t/s | $0.3 / $0.5 | 131K | 195.0 | Feb 2025 |
| 21 | Perplexity · Search + citations | 78 | — | 65 t/s | $3 / $15 | 200K | 8.7 | Feb 2025 |
| 22 | DeepSeek · Cheap-and-fast cascade tier | 78 | 1392 | 218 t/s | $0.14 / $0.28 | 1M | 371.4 | Apr 2026 |
| 23 | Mistral AI · Code generation | 76 | — | 195 t/s | $0.3 / $0.9 | 256K | 126.7 | Jan 2025 |
| 24 | Mistral AI · Open multimodal | 76 | 1361 | 158 t/s | Self-host | 256K | — | Apr 2026 |
| 25 | Anthropic · Speed & cost | 75 | 1230 | 172 t/s | $0.8 / $4 | 200K | 31.3 | Oct 2024 |
| 26 | Google · Self-hosted general purpose | 75 | 1351 | 142 t/s | Self-host | 128K | — | Apr 2026 |
| 27 | Google · Fastest + cheapest | 74 | 1240 | 244 t/s | $0.1 / $0.4 | 1M | 296.0 | Feb 2025 |
| 28 | Alibaba Cloud · Open-source coding | 74 | — | 125 t/s | $0.15 / $0.45 | 131K | 246.7 | Nov 2024 |
| 29 | OpenAI · High throughput | 72 | 1216 | 183 t/s | $0.15 / $0.6 | 128K | 192.0 | Jul 2024 |
| 30 | Meta · Longest context | 71 | 1195 | 198 t/s | $0.15 / $0.4 | 10M | 258.2 | Apr 2025 |
| 31 | Amazon · AWS ecosystem | 70 | — | 110 t/s | $0.8 / $3.2 | 300K | 35.0 | Dec 2024 |
| 32 | Cohere · Enterprise RAG | 68 | 1170 | 72 t/s | $2.5 / $10 | 128K | 10.9 | Aug 2024 |
What to do this quarter
- Update bookmarks and citations. Internal eval-spec docs and procurement RFPs that reference "lmsys.org" should be updated to lmarena.ai. The data continues at the new domain.
- Pull from the right board. Coding teams should cite the coding Arena Elo (Claude Opus 4.7 leads at 1567). Generic chat teams should cite the text leaderboard (Gemini 3.1 Pro Preview leads at ~1500).
- Build dual-vendor capability. The top four models are within 40 Elo of each other. Treat them as interchangeable on capability and optimise for switching cost.
- Pair Arena scores with workload-specific evals. Arena rewards short-conversation polish. Long-context, tool-use, and domain-specific tasks need their own measurement.
- Track the open-weight gap. DeepSeek V4 Pro under Apache 2.0 sits at 1462 Elo, within 38 points of the text leader. The gap is the smallest it has ever been.
- Watch GPT-5.5 Pro pricing. At $30/$180 per 1M tokens, paying for the top of LMArena.ai now costs 200x more per token than the cheapest tier. The cost curve is steepening.
- Re-baseline at every model launch. Tokenizer changes (Claude Opus 4.7 ships ~35% more tokens per input than 4.6) shift effective cost without shifting list price.
Related reading
- LMArena Explained — what LMArena is, how to read Arena Elo
- AI Model Leaderboard — full quality, speed, pricing comparison
- LLM Leaderboard
- LMSys Arena Leaderboard May 2026
- LMArena Elo Explained for Enterprise Buyers
Teams running side-by-side evals against multiple LMArena.ai leaders typically expose them through Swfte Connect as a single endpoint, then run their own internal Elo on production prompts. That is the only way to verify whether public Arena rank translates to your workload.