Is LMArena.ai the same as LMSys Chatbot Arena?

Yes. LMArena.ai is the rebranded continuation of the original LMSys Chatbot Arena, the human-preference benchmark started by researchers at UC Berkeley in 2023. The Arena spun out as an independent project in 2024-25, kept the same Elo methodology, and now publishes at lmarena.ai with category-specific leaderboards and a public API.

When did LMSys become LMArena?

The transition happened in late 2024 through 2025 as the project formalised into an independent organisation. The methodology — pairwise blind voting aggregated to Elo — is unchanged. Most cited "LMSys Arena Elo" numbers from 2025 onward are from the lmarena.ai endpoint.

Who leads LMArena.ai in May 2026?

May 2026 snapshot: GPT-5.5 Pro leads the text Arena at 1502 Elo. The full top 10 is below; the rankings re-shuffle weekly as new votes accumulate.

How often does LMArena.ai update?

Continuously. The site refreshes ratings daily as new votes come in, and publishes weekly aggregate reports. Major model launches typically need 7-14 days of voting before their rating stabilises within statistical confidence intervals.

Can I access LMArena.ai data programmatically?

Yes. The project publishes a public API plus periodic CSV snapshots. Major aggregators (Artificial Analysis, this site, several research papers) pull from these endpoints and re-publish with additional context.

Updated May 8, 2026

LMArena.ai — Top Models May 2026

LMArena.ai is the rebranded LMSys Chatbot Arena. Same blind pairwise-voting methodology, same Elo math, new home. Here is who leads each board this month and what the rebrand actually changed for buyers.

The LMSys to LMArena.ai story

The Chatbot Arena began in 2023 as a research project under LMSys, an academic group out of UC Berkeley. It quickly became the most-cited LLM benchmark because it measured something the capability-only benchmarks could not: actual human preference under blind side-by-side comparison. By 2024 the project had processed millions of votes, become a procurement input for Fortune 500 buyers, and outgrown its original academic scaffold. The 2024-25 transition to the lmarena.ai domain consolidated the project as an independent organisation while keeping the same Elo methodology and open vote pool.

For users the rebrand changed almost nothing: same prompts, same blind voting, same Elo math. For procurement teams the rebrand codified Arena Elo as a vendor-neutral signal independent of any single university. That is what made it sticky as a reference point in enterprise contracts.

The four-way race at the top

The May 2026 snapshot below shows three models above the historical 1500 Elo barrier on text. The top of LMArena.ai is now genuinely contested.

Top of LMArena.ai text leaderboard (May 2026)

  Gemini 3.1 Pro Preview   1500   ████████████████████   text leader
  Claude Opus 4.7 Thinking 1495   ███████████████████    coding #1
  GPT-5.5 Pro              1488   ██████████████████     reasoning
  DeepSeek V4 Pro          1462   █████████████████      Apache 2.0
  Qwen 3.6 Plus            1423   ███████████████        open weights
  Claude Sonnet 4          1402   ██████████████         workhorse tier
  GPT-4.1                  1395   █████████████          legacy frontier
  Gemini 2.5 Pro           1388   █████████████          legacy frontier
  Llama 4 Maverick         1352   ███████████            open weights
  Mistral Large 3          1341   ███████████            open weights

Full Leaderboard

32 models

#	Model	Quality	Arena ELO	Speed	Price	Context	Value	Released
1	o3 OpenAI · Hard reasoning	96	1370	68 t/s	$10 / $40	200K	3.8	Apr 2025
2	Claude Opus 4 Anthropic · Complex analysis	95	1360	52 t/s	$15 / $75	200K	2.1	May 2025
3	GPT-5.5 Pro New OpenAI · Reasoning at any cost	95	1502	92 t/s	$30 / $180	1M	0.9	Apr 2026
4	Claude Opus 4.7 New Anthropic · Coding & agentic workflows	93	1497	78 t/s	$5 / $25	1M	6.2	Apr 2026
5	Gemini 2.5 Pro Google · Multimodal + value	92	1345	87 t/s	$1.25 / $10	1M	16.4	Mar 2025
6	GPT-5.5 New OpenAI · Frontier general purpose	92	1481	138 t/s	$5 / $30	1M	5.3	Apr 2026
7	DeepSeek R1OSS DeepSeek · Cheap reasoning	91	1350	35 t/s	$0.55 / $2.19	128K	66.4	Jan 2025
8	Gemini 3.1 Pro New Google · Science & long-context	91	1500	165 t/s	$3.5 / $10.5	2M	13.0	Apr 2026
9	GPT-4.1 OpenAI · Long context	89	1310	120 t/s	$2 / $8	1M	17.8	Apr 2025
10	o3 Mini OpenAI · Reasoning & math	88	1305	155 t/s	$1.1 / $4.4	200K	32.0	Jan 2025
11	Claude Sonnet 4 Anthropic · Coding & balance	88	1320	95 t/s	$3 / $15	200K	9.8	May 2025
12	DeepSeek V4 Pro NewOSS DeepSeek · Open-source value leader	88	1462	112 t/s	$1.74 / $3.48	1M	33.7	Apr 2026
13	Grok 3 xAI · Real-time info	87	1330	82 t/s	$3 / $15	131K	9.7	Feb 2025
14	DeepSeek V3OSS DeepSeek · Best open-source value	86	1310	62 t/s	$0.27 / $1.1	128K	125.5	Mar 2025
15	GPT-4o OpenAI · General purpose	85	1285	109 t/s	$2.5 / $10	128K	13.6	May 2024
16	Qwen 3.6 Plus New Alibaba Cloud · Multilingual & APAC	84	1423	124 t/s	$1.4 / $5.6	256K	24.0	Apr 2026
17	Llama 4 MaverickOSS Meta · Open-source value	80	1260	135 t/s	$0.2 / $0.6	1M	200.0	Apr 2025
18	Qwen 2.5 72BOSS Alibaba Cloud · Open-source flagship	80	1255	85 t/s	$0.3 / $0.9	131K	133.3	Sep 2024
19	Mistral Large 2 Mistral AI · Multilingual	79	1250	78 t/s	$2 / $6	128K	19.8	Nov 2024
20	Grok 3 Mini xAI · Budget reasoning	78	1275	165 t/s	$0.3 / $0.5	131K	195.0	Feb 2025
21	Sonar Pro Perplexity · Search + citations	78	—	65 t/s	$3 / $15	200K	8.7	Feb 2025
22	DeepSeek V4 Flash NewOSS DeepSeek · Cheap-and-fast cascade tier	78	1392	218 t/s	$0.14 / $0.28	1M	371.4	Apr 2026
23	Codestral Mistral AI · Code generation	76	—	195 t/s	$0.3 / $0.9	256K	126.7	Jan 2025
24	Nemotron 3 Nano Omni NewOSS Mistral AI · Open multimodal	76	1361	158 t/s	Self-host	256K	—	Apr 2026
25	Claude 3.5 Haiku Anthropic · Speed & cost	75	1230	172 t/s	$0.8 / $4	200K	31.3	Oct 2024
26	Gemma 4 27B NewOSS Google · Self-hosted general purpose	75	1351	142 t/s	Self-host	128K	—	Apr 2026
27	Gemini 2.0 Flash Google · Fastest + cheapest	74	1240	244 t/s	$0.1 / $0.4	1M	296.0	Feb 2025
28	Qwen 2.5 Coder 32BOSS Alibaba Cloud · Open-source coding	74	—	125 t/s	$0.15 / $0.45	131K	246.7	Nov 2024
29	GPT-4o Mini OpenAI · High throughput	72	1216	183 t/s	$0.15 / $0.6	128K	192.0	Jul 2024
30	Llama 4 ScoutOSS Meta · Longest context	71	1195	198 t/s	$0.15 / $0.4	10M	258.2	Apr 2025
31	Amazon Nova Pro Amazon · AWS ecosystem	70	—	110 t/s	$0.8 / $3.2	300K	35.0	Dec 2024
32	Command R+ Cohere · Enterprise RAG	68	1170	72 t/s	$2.5 / $10	128K	10.9	Aug 2024

Quality = composite benchmark (MMLU, HumanEval, MATH)Arena ELO = LMSYS Chatbot Arena ratingValue = quality per dollarPrice = input / output per 1M tokens

What to do this quarter

Update bookmarks and citations. Internal eval-spec docs and procurement RFPs that reference "lmsys.org" should be updated to lmarena.ai. The data continues at the new domain.
Pull from the right board. Coding teams should cite the coding Arena Elo (Claude Opus 4.7 leads at 1567). Generic chat teams should cite the text leaderboard (Gemini 3.1 Pro Preview leads at ~1500).
Build dual-vendor capability. The top four models are within 40 Elo of each other. Treat them as interchangeable on capability and optimise for switching cost.
Pair Arena scores with workload-specific evals. Arena rewards short-conversation polish. Long-context, tool-use, and domain-specific tasks need their own measurement.
Track the open-weight gap. DeepSeek V4 Pro under Apache 2.0 sits at 1462 Elo, within 38 points of the text leader. The gap is the smallest it has ever been.
Watch GPT-5.5 Pro pricing. At $30/$180 per 1M tokens, paying for the top of LMArena.ai now costs 200x more per token than the cheapest tier. The cost curve is steepening.
Re-baseline at every model launch. Tokenizer changes (Claude Opus 4.7 ships ~35% more tokens per input than 4.6) shift effective cost without shifting list price.

LMArena.ai — Top Models May 2026

The LMSys to LMArena.ai story

The four-way race at the top

Full Leaderboard

What to do this quarter

Related reading