Updated May 8, 2026

LMArena Leaderboard — May 2026

What the LMArena actually is, how to read an Arena Elo score, and the current top 10 for May 2026. The original human-preference benchmark that started as LMSys Chatbot Arena and now anchors most enterprise model selection conversations.

What is the LMArena, in one paragraph?

The LMArena is a public, blind side-by-side voting site for AI chat models. A user submits a prompt, two anonymous models reply, the user picks a winner, and the project aggregates millions of such votes into Elo ratings. It started in 2023 as the LMSys Chatbot Arena out of UC Berkeley and rebranded to LMArena.ai in 2024-25 as it spun out into an independent project. The current May 2026 top 10 is below — three models now sit above the historical 1500 Elo barrier on text, with the open-weights tier within striking distance of the closed-source frontier.

How to read an Arena Elo score

Reference table for Arena Elo bands (May 2026)

  1500+   Frontier         Gemini 3.1 Pro, Claude 4.7, GPT-5.5 Pro
  1450    Frontier-adj.    DeepSeek V4 Pro, Qwen 3.6 Plus
  1400    Strong tier      GPT-4.1, Claude Sonnet 4, Gemini 2.5 Pro
  1300    Capable tier     Llama 4 Maverick, Mistral Large 3
  1200    Solid daily      Gemma 4, Phi-4, Mistral Small 3
  1100    Light tasks      DeepSeek V4 Flash, GPT-4o Mini
   <1100  Legacy tier      Older 2023-24 model generations

A 100-Elo gap means the higher-rated model wins ~64% of head-to-heads.
A 200-Elo gap means it wins ~76%. Rating shifts under 25 points are noise.

Live Leaderboard

32 models
#ModelQualityArena ELOSpeedPriceContextValueReleased
1

OpenAI · Hard reasoning

96
137068 t/s$10 / $40200K3.8Apr 2025
2

Anthropic · Complex analysis

95
136052 t/s$15 / $75200K2.1May 2025
3

OpenAI · Reasoning at any cost

95
150292 t/s$30 / $1801M0.9Apr 2026
4

Anthropic · Coding & agentic workflows

93
149778 t/s$5 / $251M6.2Apr 2026
5

Google · Multimodal + value

92
134587 t/s$1.25 / $101M16.4Mar 2025
6

OpenAI · Frontier general purpose

92
1481138 t/s$5 / $301M5.3Apr 2026
7

DeepSeek · Cheap reasoning

91
135035 t/s$0.55 / $2.19128K66.4Jan 2025
8

Google · Science & long-context

91
1500165 t/s$3.5 / $10.52M13.0Apr 2026
9

OpenAI · Long context

89
1310120 t/s$2 / $81M17.8Apr 2025
10

OpenAI · Reasoning & math

88
1305155 t/s$1.1 / $4.4200K32.0Jan 2025
11

Anthropic · Coding & balance

88
132095 t/s$3 / $15200K9.8May 2025
12

DeepSeek · Open-source value leader

88
1462112 t/s$1.74 / $3.481M33.7Apr 2026
13

xAI · Real-time info

87
133082 t/s$3 / $15131K9.7Feb 2025
14

DeepSeek · Best open-source value

86
131062 t/s$0.27 / $1.1128K125.5Mar 2025
15

OpenAI · General purpose

85
1285109 t/s$2.5 / $10128K13.6May 2024
16

Alibaba Cloud · Multilingual & APAC

84
1423124 t/s$1.4 / $5.6256K24.0Apr 2026
17

Meta · Open-source value

80
1260135 t/s$0.2 / $0.61M200.0Apr 2025
18

Alibaba Cloud · Open-source flagship

80
125585 t/s$0.3 / $0.9131K133.3Sep 2024
19

Mistral AI · Multilingual

79
125078 t/s$2 / $6128K19.8Nov 2024
20

xAI · Budget reasoning

78
1275165 t/s$0.3 / $0.5131K195.0Feb 2025
21

Perplexity · Search + citations

78
65 t/s$3 / $15200K8.7Feb 2025
22

DeepSeek · Cheap-and-fast cascade tier

78
1392218 t/s$0.14 / $0.281M371.4Apr 2026
23

Mistral AI · Code generation

76
195 t/s$0.3 / $0.9256K126.7Jan 2025
24

Mistral AI · Open multimodal

76
1361158 t/sSelf-host256KApr 2026
25

Anthropic · Speed & cost

75
1230172 t/s$0.8 / $4200K31.3Oct 2024
26

Google · Self-hosted general purpose

75
1351142 t/sSelf-host128KApr 2026
27

Google · Fastest + cheapest

74
1240244 t/s$0.1 / $0.41M296.0Feb 2025
28

Alibaba Cloud · Open-source coding

74
125 t/s$0.15 / $0.45131K246.7Nov 2024
29

OpenAI · High throughput

72
1216183 t/s$0.15 / $0.6128K192.0Jul 2024
30

Meta · Longest context

71
1195198 t/s$0.15 / $0.410M258.2Apr 2025
31

Amazon · AWS ecosystem

70
110 t/s$0.8 / $3.2300K35.0Dec 2024
32

Cohere · Enterprise RAG

68
117072 t/s$2.5 / $10128K10.9Aug 2024
Quality = composite benchmark (MMLU, HumanEval, MATH)Arena ELO = LMSYS Chatbot Arena ratingValue = quality per dollarPrice = input / output per 1M tokens

What to do this quarter

  1. Treat Arena Elo as a triage filter, not a decision. Use it to drop the bottom half of your candidate list, then run a real eval on the remainder.
  2. Pick the right Arena board. Coding teams should read the coding Arena (Claude Opus 4.7 leads at 1567 Elo). Long-context teams should read the hard-prompts Arena. The aggregate text leaderboard is the wrong signal for many enterprise workloads.
  3. Discount short-conversation polish. The Arena rewards style. Models tuned for chat win at the margin against models tuned for accuracy. Build internal evals that reward what your business actually pays for.
  4. Watch the gap, not the ranking. Sub-25 Elo shifts are within statistical noise. Anything under 50 Elo between two candidates is a coin flip on most workloads.
  5. Plan for the four-way race. Gemini 3.1 Pro, Claude Opus 4.7, GPT-5.5 Pro, and DeepSeek V4 Pro are approximately interchangeable on quality at the top. Optimise your stack for switching cost, not for capability.
  6. Capture vote-rate momentum. The fastest-rising models week-over-week are usually the next month's leaders. Subscribe to weekly Arena reports.
  7. Pair Arena Elo with cost. A 50-Elo lead at 10x the price is rarely a good trade. See our model leaderboard for combined quality-cost rankings.

Related reading

For teams running multiple top-of-Arena models in production, Swfte Connect provides a single OpenAI-compatible endpoint that routes across providers and normalises Arena-tier quality without re-architecting your stack.