Benchmarks

Human-like thinking, measured against every benchmark that matters

Capability scorecards running the adopted academic benchmarks (ARC-AGI-2, HLE, GAIA, SimpleBench, GPQA Diamond, MMLU-Pro) plus our own Rationale Integrity, Abstention, and Human-Like Thinking composite. Sortable leaderboard shows the full comparison.