A year ago, choosing an open-weight model meant accepting a real drop in quality to get control and a lower bill. In June 2026 that trade has nearly vanished. The best open models trail the best closed ones by a few points on the blended index and run a few months behind on the newest tricks. For a growing share of work, "a few points behind at one-tenth the price, and you own it" is the better deal. Here is where the line actually sits.
How close is close
Put the neutral scores next to each other and the story is plain. The top closed models (Opus 4.8 at 61.4, GPT-5.5 near 59, Gemini 3.1 Pro close behind) lead the blended Intelligence Index. The best open models cluster a few points down: Kimi-class results around 54, DeepSeek V4 Pro near 52, GLM-5.1 around 51. The gap to the absolute frontier is on the order of single digits, where two years ago it was a chasm.
On the job most teams buy for, coding, it is closer still. DeepSeek V4 Pro lands around 80% on SWE-bench Verified, a stone's throw from models that cost roughly eight times as much per token. April and May 2026 turned open-weight coding from a budget option into a default that a lot of teams reach for first and only escalate when they have to.
| Best closed | Best open | Roughly | |
|---|---|---|---|
| Blended index | 61.4 (Opus 4.8) | ~54 (Kimi-class) | ~7 points |
| Coding (SWE-bench Verified) | ~89% | ~80% | ~9 points |
| Price (output / 1M) | $25 (Opus) | $2–$4 | ~8x cheaper |
| Context | 1M | 1M | even |
| You can host it | No | Yes | the whole point |
Who the open leaders are
Four labs carry the open side right now, and three of the four are Chinese.
DeepSeek V4 Pro is the value leader: a large mixture-of-experts model under Apache 2.0, a million-token context, and pricing so low that at one launch promo it dipped under a dollar per million tokens for frontier-adjacent quality. If you want one open model to anchor a stack, this is the safe default. It is on our leaderboard near the top of the open tier.
GLM-5.1, from Z.ai, ships under the MIT license, the most permissive of the group, and scores well on agent and tool-use tasks. MIT matters for companies that want to fold the weights into a product without the obligations heavier licenses carry.
Gemma 4, Google's open-weight line under Apache 2.0, is the Western-jurisdiction pick: strong instruction-following, easy to self-host, and free of the data-residency questions the Chinese models raise for some buyers.
Kimi K2.6 posts the highest neutral score of the open-adjacent group, though it is served as a low-cost API rather than as freely downloadable weights, so treat it as "cheap and capable" more than "yours to host."
A quieter point: Alibaba's strongest model, Qwen 3.7 Max, is closed, despite Qwen's open reputation. The open-versus-closed split no longer runs cleanly along the line between Western and Chinese labs. Plenty of Chinese frontier work is now paid and API-only, and some of the most permissive licenses in the field are Western.
What still keeps the closed labs ahead
The gap is small, but it is not zero, and it shows up in specific places rather than across the board.
Complex instruction-following is the first. Give a model a prompt with eight constraints that interact ("use British spelling except in code comments, never exceed 200 words per section, and cite only sources after 2024") and the closed frontier holds all of them more reliably. Open models drop a constraint or two more often. For tightly specified output, that reliability is worth paying for.
Long-horizon agents are the second. The closed models, and Qwen's closed Max tier, are still steadier across very long runs where small errors compound. An open model is great for a task with ten steps and less certain across a task with a thousand.
Multimodal breadth is the third. Native voice, image generation, and the richest vision all live on the closed side today. The open world is catching up (NVIDIA's open omni models are real), but if you need the full sensory stack in one model, closed is still where it is.
And freshness is the fourth. Open weights tend to land three to six months after the closed frontier moves. If being on the absolute newest capability is a competitive edge for you, you will pay the closed premium to get there first.
So which should you use
The honest answer is "both, routed." But here is the decision by situation.
| If you… | Lean |
|---|---|
| Run high volume and watch the bill | Open. DeepSeek V4 Pro as the anchor. |
| Must self-host for privacy or control | Open. Gemma 4 if you also need Western jurisdiction. |
| Need the single best answer on a hard call | Closed. Opus 4.8 or GPT-5.5. |
| Build agents that run for hours and must be right | Closed for now, re-test open each quarter. |
| Ship a product around the weights | Open with a permissive license (GLM-5.1 MIT, Gemma/DeepSeek Apache). |
| Have a data-residency rule | Read the license and the hosting location, not just the score. |
Two cautions on that last row. A permissive license is not the same as a clean supply chain: know where the weights came from and where, if you use a hosted API, your tokens travel. And "open" describes the weights, not your obligations. Read the actual license text before you build a business on it.
The move that beats the debate
The teams getting the most out of mid-2026 are not picking a side. They run an open model as the workhorse for the cheap, high-volume majority of calls and keep a closed frontier model on standby for the hard minority, with a router deciding which call goes where. That setup captures most of the cost savings of open weights and most of the quality ceiling of the closed frontier at the same time. The open-versus-proprietary question, framed as "which one," is the wrong question. Framed as "what goes where," it answers itself.
The trend line is not subtle. Open weights have closed most of the gap in under two years, and nothing in the current research suggests they stop now. Closed labs will keep a lead at the very top, because that is where the newest capability lands first. But the floor keeps rising, and every month the price of "good enough to ship" falls.
Keep reading
- Qwen 3.7 Max review
- The June flagship comparison
- Best LLM 2026, ranked by job
- AI vendor lock-in leaderboard
Sources: