Updated Jun 14, 2026

LM Leaderboard — June 2026

Large language models ranked by LMSys Arena Elo, MMLU, HumanEval, MATH, pricing, and inference speed. Refreshed regularly with live data from official provider pricing pages, Artificial Analysis, and the Arena.

What is the top LM on the Arena right now?

LMArena (formerly LMSYS Chatbot Arena) tracks pairwise human votes across hundreds of thousands of conversations. Our June 2026 snapshot below ranks 357 language models on Arena Elo plus the standard MMLU / HumanEval / MATH benchmark suite. The Arena re-ranks roughly weekly as votes accumulate; what you see is the most recent snapshot verified against the public Arena and Artificial Analysis.

357 models
#ModelQualityArena ELOSpeedPriceContextValueReleased
1

Anthropic · Frontier agentic coding & knowledge work

100
152558 t/s$10 / $501M3.3Jun 2026
2

Anthropic · Coding, agents & computer use

99
151272 t/s$5 / $251M6.6May 2026
3

OpenAI · Reasoning at any cost

98
151068 t/s$30 / $1801M0.9Apr 2026
4

OpenAI · Frontier general purpose

97
150670 t/s$5 / $301M5.5Apr 2026
5

OpenAI · Complex analysis

97
$30 / $1801M0.9Mar 2026
6

OpenAI · Complex analysis

97
$21 / $168400K1.0Dec 2025
7

Anthropic · Complex analysis

97
$30 / $1501M1.1May 2026
8

Anthropic · Coding & agentic workflows

96
150568 t/s$5 / $251M6.4Apr 2026
9

OpenAI · Deep research

96
$10 / $40200K3.8Oct 2025
10

OpenAI · Deep research

96
$2 / $8200K19.2Oct 2025
11

OpenAI · Hard reasoning

96
$20 / $80200K1.9Jun 2025
12

Google · Speed & cost

96
1505$2 / $121M13.7Feb 2026
13

Google · Science & long-context

96
1505131 t/s$2 / $121M13.7Apr 2026
14

Anthropic · General purpose

95
1490$5 / $251M6.3Feb 2026
15

Anthropic · General purpose

95
$5 / $25200K6.3Nov 2025
16

Anthropic · Complex analysis

95
$30 / $1501M1.1Apr 2026
17

Google · Image generation

94
$2 / $1266K13.4Nov 2025
18

Anthropic · Multimodal

94
$15 / $75200K2.1Aug 2025
19

OpenAI · Hard reasoning

94
137068 t/s$10 / $40200K3.8Apr 2025
20

Alibaba Cloud · Long autonomous agentic runs

94
148890 t/s$2.5 / $7.51M18.8May 2026
21

xAI · Agentic tasks & real-time info

93
149683 t/s$1.25 / $2.51M49.6May 2026
22

OpenAI · General purpose

93
1495$2.5 / $151M10.6Mar 2026
23

OpenAI · General purpose

93
$1.75 / $14128K11.8Mar 2026
24

OpenAI · Code generation

93
$1.75 / $14400K11.8Feb 2026
25

OpenAI · Code generation

93
$1.75 / $14400K11.8Jan 2026
26

OpenAI · General purpose

93
$1.75 / $14128K11.8Dec 2025
27

OpenAI · General purpose

93
$1.75 / $14400K11.8Dec 2025
28

OpenAI · Code generation

93
$1.25 / $10400K16.5Dec 2025
29

OpenAI · General purpose

93
$1.25 / $10400K16.5Nov 2025
30

OpenAI · General purpose

93
$1.25 / $10128K16.5Nov 2025
31

OpenAI · Code generation

93
$1.25 / $10400K16.5Nov 2025
32

OpenAI · Hard reasoning

93
$150 / $600200K0.2Mar 2025
33

OpenAI · Complex analysis

93
$30 / $608K2.1May 2023
34

OpenAI · Multimodal

93
$30 / $608K2.1May 2023
35

xAI · General purpose

93
1496$1.25 / $2.52M49.6Mar 2026
36

OpenAI · Complex analysis

93
$8 / $15272K8.1Apr 2026
37

Moonshot AI · Frontier quality at low cost

92
146648 t/s$0.73 / $3.49256K43.6Apr 2026
38

Google · Multimodal + value

92
134587 t/s$1.25 / $101M16.4Mar 2025
39

Anthropic · Complex analysis

91
136052 t/s$15 / $75200K2.0May 2025
40

· Hard reasoning

91
$0.3 / $1.1164K130.0Jul 2025
41

Google · Speed & cost

91
$1.25 / $101M16.2Jun 2025
42

DeepSeek · Hard reasoning

91
$0.5 / $2.15164K68.7May 2025
43

Google · Speed & cost

91
$1.25 / $101M16.2May 2025
44

DeepSeek · Hard reasoning

91
$0.29 / $0.2933K313.8Jan 2025
45

DeepSeek · Hard reasoning

91
$0.7 / $0.8131K121.3Jan 2025
46

DeepSeek · Hard reasoning

91
$0.7 / $2.564K56.9Jan 2025
47

Moonshot AI · Open-weight agentic coding

91
55 t/s$0.73 / $3.49256K43.1Jun 2026
48

· Open-weight reasoning & tool use

91
50 t/s$0.2 / $0.8262K182.0Jun 2026
49

DeepSeek · Open-source value leader

90
146733 t/s$1.74 / $3.481M34.5Apr 2026
50

Anthropic · Coding & balance

90
146773 t/s$3 / $151M10.0Feb 2026
51

OpenAI · General purpose

90
1455$1.25 / $10400K16.0Aug 2025
52

xAI · General purpose

90
$3 / $15131K10.0Apr 2025
53

Alibaba Cloud · Open-source

90
$1.04 / $6.24262K24.7Apr 2026
54

OpenAI · Long context

89
1310120 t/s$2 / $81M17.8Apr 2025
55

Moonshot AI · Speed & cost

89
1452$0.4 / $1.9262K77.4Jan 2026
56

· Open-weight agentic coding

89
145580 t/s$0.6 / $2.41M59.3Jun 2026
57

· Open-weight agentic coding (provisional)

89
$0.98 / $3.08200K43.8Jun 2026
58

· Open-weight agentic & tool use

88
146748 t/s$0.98 / $3.08200K43.3Apr 2026
59

OpenAI · Multimodal

88
$10 / $10400K8.8Oct 2025
60

OpenAI · Complex analysis

88
$15 / $120400K1.3Oct 2025
61

Anthropic · General purpose

88
$3 / $151M9.8Sep 2025
62

OpenAI · General purpose

88
$2.5 / $10128K14.1Aug 2025
63

OpenAI · Search + citations

88
$2.5 / $10128K14.1Mar 2025
64

OpenAI · Hard reasoning

88
$15 / $60200K2.3Dec 2024
65

OpenAI · General purpose

88
$2.5 / $10128K14.1Nov 2024
66

OpenAI · General purpose

88
$2.5 / $10128K14.1May 2024
67

OpenAI · Multimodal

88
$6 / $18128K7.3May 2024
68

OpenAI · General purpose

88
$5 / $15128K8.8May 2024
69

OpenAI · Multimodal

88
$10 / $30128K4.4Apr 2024
70

OpenAI · Complex analysis

88
$10 / $30128K4.4Jan 2024
71

OpenAI · Multimodal

88
$10 / $30128K4.4Nov 2023
72

· Open-source

88
1450$0.6 / $1.9280K69.8Feb 2026
73

Anthropic · Coding & balance

88
132095 t/s$3 / $15200K9.8May 2025
74

OpenAI · Reasoning & math

88
1305155 t/s$1.1 / $4.4200K32.0Jan 2025
75

xAI · Real-time info

87
133082 t/s$3 / $15131K9.7Feb 2025
76

DeepSeek · Open-source

87
1455$0.252 / $0.378164K276.2Dec 2025
77

· Open-source

86
$0.135 / $0.5131K270.9Dec 2025
78

DeepSeek · Open-source

86
$0.287 / $0.431164K239.6Dec 2025
79

DeepSeek · Open-source

86
$0.27 / $0.41164K252.9Sep 2025
80

DeepSeek · Open-source

86
$0.27 / $0.95164K141.0Sep 2025
81

DeepSeek · Open-source

86
$0.21 / $0.7933K172.0Aug 2025
82

DeepSeek · Open-source

86
$0.2 / $0.77164K177.3Mar 2025
83

Anthropic · General purpose

86
$3 / $15200K9.6Feb 2025
84

Anthropic · Hard reasoning

86
$3 / $15200K9.6Feb 2025
85

DeepSeek · Best open-source value

86
131062 t/s$0.27 / $1.1128K125.5Mar 2025
86

Alibaba Cloud · Multilingual & APAC

86
1448124 t/s$1.4 / $5.6256K24.6Apr 2026
87

OpenAI · General purpose

85
1285109 t/s$2.5 / $10128K13.6May 2024
88

Mistral AI · Open-source

85
$0.5 / $1.5262K85.0Dec 2025
89

Mistral AI · Open-source

85
$2 / $6131K21.3Nov 2024
90

Mistral AI · Open-source

85
$2 / $6128K21.3Feb 2024
91

Google · Speed & cost

84
$1.5 / $91M16.0May 2026
92

· Accessible open-weight agentics

84
110 t/s$0.05 / $0.2262K672.0Jun 2026
93

OpenAI · Speed & cost

83
$0.75 / $4.5400K31.6Mar 2026
94

OpenAI · Speed & cost

83
$0.25 / $2400K73.8Aug 2025
95

Alibaba Cloud · Open-source

82
$0.04 / $0.15256K863.2Mar 2026
96

Alibaba Cloud · Open-source

82
$0.139 / $1262K144.0Feb 2026
97

Alibaba Cloud · Open-source

82
$0.195 / $1.56262K93.4Feb 2026
98

Alibaba Cloud · Open-source

82
$0.26 / $2.08262K70.1Feb 2026
99

Alibaba Cloud · Speed & cost

82
$0.065 / $0.261M504.6Feb 2026
100

Alibaba Cloud · Open-source

82
$0.26 / $1.561M90.1Feb 2026
Page 1 of 4 · 1100 of 357
Quality = composite benchmark (MMLU, HumanEval, MATH)Arena ELO = LMSYS Chatbot Arena ratingValue = quality per dollarPrice = input / output per 1M tokens

How the LLM leaderboard works

We pull official provider pricing every 24 hours, Artificial Analysis benchmark snapshots weekly, and LMSys Arena Elo as it publishes. The composite quality index is a 0-100 normalization over MMLU Pro, HumanEval, and MATH, weighted by recency and cross-validated against Arena Elo. We do not accept vendor-supplied numbers without an independent reference.

Where the leaderboard is wrong

No leaderboard predicts your production accuracy. LMSys Arena rewards style and short-conversation polish; a top-Arena model can still under-perform on your specific function-calling schema or long-context retrieval workload. Build an internal eval harness before you commit. See our LMArena Elo explained and LLM routing writeups for the deep-dive.

Related rankings