Leaderboard
LLM Capability Leaderboard
Every major model across the benchmarks researchers cite. Sort by any column. Rows marked imported are from the public leaderboards of the benchmark maintainers; Swfte’s own runs replace them as they complete.
Updated 2026-05-06 · Methodology
| Model | Provider | Human-Like | ARC-AGI-2 | HLE | GAIA | SimpleBench | GPQA-Diamond | MMLU-Pro | Human-Like Thinking | Source |
|---|---|---|---|---|---|---|---|---|---|---|
| Claude Opus 4.6 | Anthropic | — | — | — | — | — | — | — | — | imported |
| Claude Sonnet 4.6 | Anthropic | — | — | — | — | — | — | — | — | imported |
| Claude Haiku 4.5 | Anthropic | — | — | — | — | — | — | — | — | imported |
| GPT-5 | OpenAI | — | — | — | — | — | — | — | — | imported |
| GPT-4.5 | OpenAI | — | — | — | — | — | — | — | — | imported |
| o3-mini | OpenAI | — | — | — | — | — | — | — | — | imported |
| Gemini 2.5 Pro | — | — | — | — | — | — | — | — | imported | |
| Gemini 2.5 Flash | — | — | — | — | — | — | — | — | imported | |
| Llama 4 405B | Meta | — | — | — | — | — | — | — | — | imported |
| Llama 4 70B | Meta | — | — | — | — | — | — | — | — | imported |
| Mistral Large 2 | Mistral | — | — | — | — | — | — | — | — | imported |
| Mistral Small 3 | Mistral | — | — | — | — | — | — | — | — | imported |
| DeepSeek V3 | DeepSeek | — | — | — | — | — | — | — | — | imported |
| DeepSeek R1 | DeepSeek | — | — | — | — | — | — | — | — | imported |
| Qwen 3 | Alibaba | — | — | — | — | — | — | — | — | imported |
| Command R+ | Cohere | — | — | — | — | — | — | — | — | imported |
| Kimi K2 | Moonshot | — | — | — | — | — | — | — | — | imported |
| Grok 3 | xAI | — | — | — | — | — | — | — | — | imported |
| Jamba 1.5 | AI21 | — | — | — | — | — | — | — | — | imported |
| Phi-4 | Microsoft | — | — | — | — | — | — | — | — | imported |
| Gemma 3 | — | — | — | — | — | — | — | — | imported |