LMArena (the project formerly known as LMSys Chatbot Arena) is a public benchmark where humans blind-vote between two anonymous model responses to the same prompt. Votes are aggregated into Elo ratings — the same statistical method used to rank chess players — so a model rating only goes up if it consistently beats peers in side-by-side comparison.

Who is currently #1 on LMArena in May 2026?

Our May 2026 snapshot of the LMArena text leaderboard puts GPT-5.5 Pro at the top with 1502 Elo. The full top 10 is below; the rankings re-shuffle weekly as votes accumulate.

How should I read Arena Elo as an enterprise buyer?

Arena Elo is a strong signal for general chat quality and human preference under short-conversation conditions. It does not predict your production accuracy on function-calling, long-context retrieval, or domain-specific tasks. Use it as a triage filter, not a procurement decision — then build an internal eval harness on your own workload.

Why are there separate LMArena leaderboards?

The Arena now publishes separate leaderboards for text chat, coding, hard prompts, math, multi-turn, vision, and several language-specific variants. A model can lead one and rank mid-pack on another. The split between text and coding leaders is the most cited illustration.

Is the LMArena gameable?

Partially. Style polish and short-answer formatting tend to win votes, which can favour models tuned for chat over models tuned for accuracy. The Arena team has shipped category-specific leaderboards and prompt-category filters to reduce this skew. It is still the best human-preference signal we have at scale.

Updated May 8, 2026

LMArena Leaderboard — May 2026

What the LMArena actually is, how to read an Arena Elo score, and the current top 10 for May 2026. The original human-preference benchmark that started as LMSys Chatbot Arena and now anchors most enterprise model selection conversations.

What is the LMArena, in one paragraph?

The LMArena is a public, blind side-by-side voting site for AI chat models. A user submits a prompt, two anonymous models reply, the user picks a winner, and the project aggregates millions of such votes into Elo ratings. It started in 2023 as the LMSys Chatbot Arena out of UC Berkeley and rebranded to LMArena.ai in 2024-25 as it spun out into an independent project. The current May 2026 top 10 is below — three models now sit above the historical 1500 Elo barrier on text, with the open-weights tier within striking distance of the closed-source frontier.

How to read an Arena Elo score

Reference table for Arena Elo bands (May 2026)

  1500+   Frontier         Gemini 3.1 Pro, Claude 4.7, GPT-5.5 Pro
  1450    Frontier-adj.    DeepSeek V4 Pro, Qwen 3.6 Plus
  1400    Strong tier      GPT-4.1, Claude Sonnet 4, Gemini 2.5 Pro
  1300    Capable tier     Llama 4 Maverick, Mistral Large 3
  1200    Solid daily      Gemma 4, Phi-4, Mistral Small 3
  1100    Light tasks      DeepSeek V4 Flash, GPT-4o Mini
   <1100  Legacy tier      Older 2023-24 model generations

A 100-Elo gap means the higher-rated model wins ~64% of head-to-heads.
A 200-Elo gap means it wins ~76%. Rating shifts under 25 points are noise.

Live Leaderboard

32 models

#	Model	Quality	Arena ELO	Speed	Price	Context	Value	Released
1	o3 OpenAI · Hard reasoning	96	1370	68 t/s	$10 / $40	200K	3.8	Apr 2025
2	Claude Opus 4 Anthropic · Complex analysis	95	1360	52 t/s	$15 / $75	200K	2.1	May 2025
3	GPT-5.5 Pro New OpenAI · Reasoning at any cost	95	1502	92 t/s	$30 / $180	1M	0.9	Apr 2026
4	Claude Opus 4.7 New Anthropic · Coding & agentic workflows	93	1497	78 t/s	$5 / $25	1M	6.2	Apr 2026
5	Gemini 2.5 Pro Google · Multimodal + value	92	1345	87 t/s	$1.25 / $10	1M	16.4	Mar 2025
6	GPT-5.5 New OpenAI · Frontier general purpose	92	1481	138 t/s	$5 / $30	1M	5.3	Apr 2026
7	DeepSeek R1OSS DeepSeek · Cheap reasoning	91	1350	35 t/s	$0.55 / $2.19	128K	66.4	Jan 2025
8	Gemini 3.1 Pro New Google · Science & long-context	91	1500	165 t/s	$3.5 / $10.5	2M	13.0	Apr 2026
9	GPT-4.1 OpenAI · Long context	89	1310	120 t/s	$2 / $8	1M	17.8	Apr 2025
10	o3 Mini OpenAI · Reasoning & math	88	1305	155 t/s	$1.1 / $4.4	200K	32.0	Jan 2025
11	Claude Sonnet 4 Anthropic · Coding & balance	88	1320	95 t/s	$3 / $15	200K	9.8	May 2025
12	DeepSeek V4 Pro NewOSS DeepSeek · Open-source value leader	88	1462	112 t/s	$1.74 / $3.48	1M	33.7	Apr 2026
13	Grok 3 xAI · Real-time info	87	1330	82 t/s	$3 / $15	131K	9.7	Feb 2025
14	DeepSeek V3OSS DeepSeek · Best open-source value	86	1310	62 t/s	$0.27 / $1.1	128K	125.5	Mar 2025
15	GPT-4o OpenAI · General purpose	85	1285	109 t/s	$2.5 / $10	128K	13.6	May 2024
16	Qwen 3.6 Plus New Alibaba Cloud · Multilingual & APAC	84	1423	124 t/s	$1.4 / $5.6	256K	24.0	Apr 2026
17	Llama 4 MaverickOSS Meta · Open-source value	80	1260	135 t/s	$0.2 / $0.6	1M	200.0	Apr 2025
18	Qwen 2.5 72BOSS Alibaba Cloud · Open-source flagship	80	1255	85 t/s	$0.3 / $0.9	131K	133.3	Sep 2024
19	Mistral Large 2 Mistral AI · Multilingual	79	1250	78 t/s	$2 / $6	128K	19.8	Nov 2024
20	Grok 3 Mini xAI · Budget reasoning	78	1275	165 t/s	$0.3 / $0.5	131K	195.0	Feb 2025
21	Sonar Pro Perplexity · Search + citations	78	—	65 t/s	$3 / $15	200K	8.7	Feb 2025
22	DeepSeek V4 Flash NewOSS DeepSeek · Cheap-and-fast cascade tier	78	1392	218 t/s	$0.14 / $0.28	1M	371.4	Apr 2026
23	Codestral Mistral AI · Code generation	76	—	195 t/s	$0.3 / $0.9	256K	126.7	Jan 2025
24	Nemotron 3 Nano Omni NewOSS Mistral AI · Open multimodal	76	1361	158 t/s	Self-host	256K	—	Apr 2026
25	Claude 3.5 Haiku Anthropic · Speed & cost	75	1230	172 t/s	$0.8 / $4	200K	31.3	Oct 2024
26	Gemma 4 27B NewOSS Google · Self-hosted general purpose	75	1351	142 t/s	Self-host	128K	—	Apr 2026
27	Gemini 2.0 Flash Google · Fastest + cheapest	74	1240	244 t/s	$0.1 / $0.4	1M	296.0	Feb 2025
28	Qwen 2.5 Coder 32BOSS Alibaba Cloud · Open-source coding	74	—	125 t/s	$0.15 / $0.45	131K	246.7	Nov 2024
29	GPT-4o Mini OpenAI · High throughput	72	1216	183 t/s	$0.15 / $0.6	128K	192.0	Jul 2024
30	Llama 4 ScoutOSS Meta · Longest context	71	1195	198 t/s	$0.15 / $0.4	10M	258.2	Apr 2025
31	Amazon Nova Pro Amazon · AWS ecosystem	70	—	110 t/s	$0.8 / $3.2	300K	35.0	Dec 2024
32	Command R+ Cohere · Enterprise RAG	68	1170	72 t/s	$2.5 / $10	128K	10.9	Aug 2024

Quality = composite benchmark (MMLU, HumanEval, MATH)Arena ELO = LMSYS Chatbot Arena ratingValue = quality per dollarPrice = input / output per 1M tokens

What to do this quarter

Treat Arena Elo as a triage filter, not a decision. Use it to drop the bottom half of your candidate list, then run a real eval on the remainder.
Pick the right Arena board. Coding teams should read the coding Arena (Claude Opus 4.7 leads at 1567 Elo). Long-context teams should read the hard-prompts Arena. The aggregate text leaderboard is the wrong signal for many enterprise workloads.
Discount short-conversation polish. The Arena rewards style. Models tuned for chat win at the margin against models tuned for accuracy. Build internal evals that reward what your business actually pays for.
Watch the gap, not the ranking. Sub-25 Elo shifts are within statistical noise. Anything under 50 Elo between two candidates is a coin flip on most workloads.
Plan for the four-way race. Gemini 3.1 Pro, Claude Opus 4.7, GPT-5.5 Pro, and DeepSeek V4 Pro are approximately interchangeable on quality at the top. Optimise your stack for switching cost, not for capability.
Capture vote-rate momentum. The fastest-rising models week-over-week are usually the next month's leaders. Subscribe to weekly Arena reports.
Pair Arena Elo with cost. A 50-Elo lead at 10x the price is rarely a good trade. See our model leaderboard for combined quality-cost rankings.

LMArena Leaderboard — May 2026

What is the LMArena, in one paragraph?

How to read an Arena Elo score

Live Leaderboard

What to do this quarter

Related reading