Updated May 21, 2026

AI Model Leaderboard — May 2026

Every major AI model ranked by quality, speed, pricing, and value. Filter by category, sort by any metric, and find the right model for your use case. Live data refreshed monthly with LMSys Arena Elo, official provider pricing, and Artificial Analysis benchmarks.

Gold

GPT-5.5 Pro

98

Quality Index

Silver

GPT-5.5

97

Quality Index

Bronze

Claude Opus 4.7

96

Quality Index

Monthly Snapshot

May 2026: Top Models, Best Value, Fastest Inference

The May 2026 ranking covers 36 models across LMSys Arena Elo, MMLU Pro, HumanEval, MATH, pricing, and inference speed. Top of the table: GPT-5.5 Pro at 98/100 quality. The full table below is sortable by any metric. Live data is refreshed hourly from official provider pricing pages and the public Arena.

Top 5 by Quality Index

  1. GPT-5.5 Pro 98/100
  2. GPT-5.5 97/100
  3. Claude Opus 4.7 96/100
  4. Gemini 3.1 Pro 96/100
  5. o3 94/100

Best Price-to-Quality

  1. DeepSeek V4 Flash — $0.28/1M out
  2. Gemini 2.0 Flash — $0.4/1M out
  3. Llama 4 Scout — $0.4/1M out
  4. Qwen 2.5 Coder 32B — $0.45/1M out
  5. Grok 3 Mini — $0.5/1M out

See our LMSys Arena deep dive and the monthly release roundup.

36 models
#ModelQualityArena ELOSpeedPriceContextValueReleased
1

OpenAI · Reasoning at any cost

98
151068 t/s$30 / $1801M0.9Apr 2026
2

OpenAI · Frontier general purpose

97
150670 t/s$5 / $301M5.5Apr 2026
3

Anthropic · Coding & agentic workflows

96
150568 t/s$5 / $251M6.4Apr 2026
4

Google · Science & long-context

96
1505131 t/s$2 / $121M13.7Apr 2026
5

OpenAI · Hard reasoning

94
137068 t/s$10 / $40200K3.8Apr 2025
6

xAI · Agentic tasks & real-time info

93
149683 t/s$1.25 / $2.51M49.6May 2026
7

Google · Multimodal + value

92
134587 t/s$1.25 / $101M16.4Mar 2025
8

Moonshot AI · Frontier quality at low cost

92
146648 t/s$0.95 / $4256K37.2Apr 2026
9

Anthropic · Complex analysis

91
136052 t/s$15 / $75200K2.0May 2025
10

DeepSeek · Cheap reasoning

91
135035 t/s$0.55 / $2.19128K66.4Jan 2025
11

DeepSeek · Open-source value leader

90
146733 t/s$1.74 / $3.481M34.5Apr 2026
12

Anthropic · Coding & balance

90
146773 t/s$3 / $151M10.0Feb 2026
13

OpenAI · Long context

89
1310120 t/s$2 / $81M17.8Apr 2025
14

OpenAI · Reasoning & math

88
1305155 t/s$1.1 / $4.4200K32.0Jan 2025
15

Anthropic · Coding & balance

88
132095 t/s$3 / $15200K9.8May 2025
16
GLM-5.1 NewOSS

Z.ai (Zhipu AI) · Open-weight agentic & tool use

88
146748 t/s$1.55 / $4.65200K28.4Apr 2026
17

xAI · Real-time info

87
133082 t/s$3 / $15131K9.7Feb 2025
18

DeepSeek · Best open-source value

86
131062 t/s$0.27 / $1.1128K125.5Mar 2025
19

Alibaba Cloud · Multilingual & APAC

86
1448124 t/s$1.4 / $5.6256K24.6Apr 2026
20

OpenAI · General purpose

85
1285109 t/s$2.5 / $10128K13.6May 2024
21

Meta · Open-source value

80
1260135 t/s$0.2 / $0.61M200.0Apr 2025
22

Alibaba Cloud · Open-source flagship

80
125585 t/s$0.3 / $0.9131K133.3Sep 2024
23

DeepSeek · Cheap-and-fast cascade tier

80
1410105 t/s$0.14 / $0.281M381.0Apr 2026
24

Mistral AI · Multilingual

79
125078 t/s$2 / $6128K19.8Nov 2024
25

xAI · Budget reasoning

78
1275165 t/s$0.3 / $0.5131K195.0Feb 2025
26

Perplexity · Search + citations

78
65 t/s$3 / $15200K8.7Feb 2025
27

Mistral AI · Code generation

76
195 t/s$0.3 / $0.9256K126.7Jan 2025
28

Mistral AI · Open multimodal

76
1361158 t/sSelf-host256KApr 2026
29

Anthropic · Speed & cost

75
1230172 t/s$0.8 / $4200K31.3Oct 2024
30

Google · Self-hosted general purpose

75
1351142 t/sSelf-host128KApr 2026
31

Google · Fastest + cheapest

74
1240244 t/s$0.1 / $0.41M296.0Feb 2025
32

Alibaba Cloud · Open-source coding

74
125 t/s$0.15 / $0.45131K246.7Nov 2024
33

OpenAI · High throughput

72
1216183 t/s$0.15 / $0.6128K192.0Jul 2024
34

Meta · Longest context

71
1195198 t/s$0.15 / $0.410M258.2Apr 2025
35

Amazon · AWS ecosystem

70
110 t/s$0.8 / $3.2300K35.0Dec 2024
36

Cohere · Enterprise RAG

68
117072 t/s$2.5 / $10128K10.9Aug 2024
Quality = composite benchmark (MMLU, HumanEval, MATH)Arena ELO = LMSYS Chatbot Arena ratingValue = quality per dollarPrice = input / output per 1M tokens

How We Rank AI Models

Our leaderboard uses a composite quality index that combines three key benchmarks: MMLU Pro (measuring knowledge and reasoning across 57 subjects), HumanEval (measuring code generation ability), and MATH (measuring mathematical problem-solving). Scores are normalized to a 0-100 scale and cross-referenced against LMSYS Chatbot Arena ELO ratings for real-world validation.

We track speed (tokens per second), time-to-first-token (TTFT), pricing, and context window size to give you a complete picture. The Value Score divides quality by cost, showing you which models deliver the most capability per dollar.

Key Trends in AI Model Performance

  • Open-source catching up: DeepSeek R1 and V3 now compete with top closed-source models on reasoning and coding benchmarks
  • Reasoning specialization: Models like o3 and R1 trade speed for dramatically better performance on complex tasks
  • Context windows expanding: 1M+ tokens is now standard for flagship models, with Llama 4 Scout supporting 10M
  • Speed improving: Flash-tier models now exceed 200 tokens/second while maintaining strong quality

Choosing the Right Model

There is no single "best" model — it depends on your use case. For most applications, a model routing approach works best: route simple queries to fast, cheap models and complex queries to frontier models. This gives you the best of both worlds — low cost and high quality.