Updated May 8, 2026

LM Leaderboard — May 2026

Large language models ranked by LMSys Arena Elo, MMLU, HumanEval, MATH, pricing, and inference speed. Refreshed monthly with live data from official provider pricing pages, Artificial Analysis, and the Arena.

What is the top LM on the Arena right now?

LMArena (formerly LMSYS Chatbot Arena) tracks pairwise human votes across hundreds of thousands of conversations. Our May 2026 snapshot below ranks 32 language models on Arena Elo plus the standard MMLU / HumanEval / MATH benchmark suite. The Arena re-ranks roughly weekly as votes accumulate; what you see is the most recent snapshot verified against the public Arena and Artificial Analysis.

32 models
#ModelQualityArena ELOSpeedPriceContextValueReleased
1

OpenAI · Hard reasoning

96
137068 t/s$10 / $40200K3.8Apr 2025
2

Anthropic · Complex analysis

95
136052 t/s$15 / $75200K2.1May 2025
3

OpenAI · Reasoning at any cost

95
150292 t/s$30 / $1801M0.9Apr 2026
4

Anthropic · Coding & agentic workflows

93
149778 t/s$5 / $251M6.2Apr 2026
5

Google · Multimodal + value

92
134587 t/s$1.25 / $101M16.4Mar 2025
6

OpenAI · Frontier general purpose

92
1481138 t/s$5 / $301M5.3Apr 2026
7

DeepSeek · Cheap reasoning

91
135035 t/s$0.55 / $2.19128K66.4Jan 2025
8

Google · Science & long-context

91
1500165 t/s$3.5 / $10.52M13.0Apr 2026
9

OpenAI · Long context

89
1310120 t/s$2 / $81M17.8Apr 2025
10

OpenAI · Reasoning & math

88
1305155 t/s$1.1 / $4.4200K32.0Jan 2025
11

Anthropic · Coding & balance

88
132095 t/s$3 / $15200K9.8May 2025
12

DeepSeek · Open-source value leader

88
1462112 t/s$1.74 / $3.481M33.7Apr 2026
13

xAI · Real-time info

87
133082 t/s$3 / $15131K9.7Feb 2025
14

DeepSeek · Best open-source value

86
131062 t/s$0.27 / $1.1128K125.5Mar 2025
15

OpenAI · General purpose

85
1285109 t/s$2.5 / $10128K13.6May 2024
16

Alibaba Cloud · Multilingual & APAC

84
1423124 t/s$1.4 / $5.6256K24.0Apr 2026
17

Meta · Open-source value

80
1260135 t/s$0.2 / $0.61M200.0Apr 2025
18

Alibaba Cloud · Open-source flagship

80
125585 t/s$0.3 / $0.9131K133.3Sep 2024
19

Mistral AI · Multilingual

79
125078 t/s$2 / $6128K19.8Nov 2024
20

xAI · Budget reasoning

78
1275165 t/s$0.3 / $0.5131K195.0Feb 2025
21

Perplexity · Search + citations

78
65 t/s$3 / $15200K8.7Feb 2025
22

DeepSeek · Cheap-and-fast cascade tier

78
1392218 t/s$0.14 / $0.281M371.4Apr 2026
23

Mistral AI · Code generation

76
195 t/s$0.3 / $0.9256K126.7Jan 2025
24

Mistral AI · Open multimodal

76
1361158 t/sSelf-host256KApr 2026
25

Anthropic · Speed & cost

75
1230172 t/s$0.8 / $4200K31.3Oct 2024
26

Google · Self-hosted general purpose

75
1351142 t/sSelf-host128KApr 2026
27

Google · Fastest + cheapest

74
1240244 t/s$0.1 / $0.41M296.0Feb 2025
28

Alibaba Cloud · Open-source coding

74
125 t/s$0.15 / $0.45131K246.7Nov 2024
29

OpenAI · High throughput

72
1216183 t/s$0.15 / $0.6128K192.0Jul 2024
30

Meta · Longest context

71
1195198 t/s$0.15 / $0.410M258.2Apr 2025
31

Amazon · AWS ecosystem

70
110 t/s$0.8 / $3.2300K35.0Dec 2024
32

Cohere · Enterprise RAG

68
117072 t/s$2.5 / $10128K10.9Aug 2024
Quality = composite benchmark (MMLU, HumanEval, MATH)Arena ELO = LMSYS Chatbot Arena ratingValue = quality per dollarPrice = input / output per 1M tokens

How the LLM leaderboard works

We pull official provider pricing every 24 hours, Artificial Analysis benchmark snapshots weekly, and LMSys Arena Elo as it publishes. The composite quality index is a 0-100 normalization over MMLU Pro, HumanEval, and MATH, weighted by recency and cross-validated against Arena Elo. We do not accept vendor-supplied numbers without an independent reference.

Where the leaderboard is wrong

No leaderboard predicts your production accuracy. LMSys Arena rewards style and short-conversation polish; a top-Arena model can still under-perform on your specific function-calling schema or long-context retrieval workload. Build an internal eval harness before you commit. See our LMArena Elo explained and LLM routing writeups for the deep-dive.

Related rankings