Zhipu AI (Z.AI) dropped GLM 5.2 on June 13, 2026, and it didn’t just raise the bar for open-weights models — it grabbed the bar and threw it across the room. With a 744B Mixture-of-Experts architecture, a usable 1-million-token context window, and an MIT license, this is the strongest argument yet that you don’t need a closed API to run frontier-level AI.
I’ve been testing it through multiple providers over the last few days. Here’s what you need to know.
The Numbers That Matter
Artificial Analysis rates GLM 5.2 at 51 on their Intelligence Index v4.1. That puts it firmly at #1 among open-weights models:
| Model | Intelligence Index | Cost/Task |
|---|---|---|
| GLM 5.2 | 51 | $0.46 |
| MiniMax-M3 | 44 | — |
| DeepSeek V4 Pro (max) | 44 | — |
| Kimi K2.6 | 43 | — |
The gap isn’t small. An 11-point jump over GLM 5.1, with the same MoE parameter count, means Z.AI found serious gains in architecture rather than just throwing more compute at the problem.
Architecture: 744B Total, 40B Active
GLM 5.2 uses a MoE design — 744 billion total parameters with only 40 billion active per token. This is the same footprint as GLM 5.1, which makes the performance jump even more impressive. The improvements come from:
- Better sparse attention — handles the 1M context window without quadratic blowup
- Multi-token prediction — generates multiple tokens in parallel, reducing latency
- IndexShare — a new mechanism for efficient long-context retrieval that makes agentic coding workflows actually usable at scale
The 1M context window isn’t just marketing. In practice, it holds coherence through entire codebase traversals that would make Claude or GPT start losing track around 100-200K tokens.
Where It Shines: Coding Agents
This is the headline. GLM 5.2 scores ~78-81% on Terminal-Bench v2.1 — the first open-weights model to cross 80%. For context, that benchmark tests real terminal-based coding tasks: read a codebase, find the bug, write the fix, run the tests.
On SWE-Bench Pro it hits ~62%, and on FrontierSWE it places in the global top 3. These are the benchmarks that matter for anyone building autonomous coding agents.
I ran it through a few of my own Hermes Agent subagent workflows — the model stays coherent through long debugging sessions in ways that earlier open models simply couldn’t.
The MIT License Changes Everything
This isn’t “open weights with a research-only license.” GLM 5.2 is MIT licensed. That means:
- Run it locally via vLLM, SGLang, or Transformers
- Fine-tune it on proprietary data
- Deploy it commercially
- Build products on top of it
No usage restrictions. No “contact us for enterprise licensing.” Just code and weights.
It’s already available on Hugging Face and through providers like DeepInfra, Fireworks, SiliconFlow, Novita, and Nebius. The first-party Z.AI API charges ~$0.46 per task at its intelligence level — putting it on the Pareto frontier for cost vs. capability.
Trade-offs
Not everything is perfect:
- Token efficiency: GLM 5.2 uses ~43K output tokens per Intelligence Index task (of which ~37K is reasoning). That’s higher than MiniMax-M3 or Kimi K2.6. If you’re paying per token, factor this in.
- Not the absolute best at everything: Closed models (GPT-5.5 xhigh, Claude Opus 4.x) still lead on pure reasoning benchmarks. But GLM 5.2 narrows the gap meaningfully.
- Infrastructure: Running a 744B model locally isn’t trivial. You’ll want at least 4× A100 or 8× H100 for reasonable throughput.
What This Means
For the first time, an open-weights model is genuinely competitive with closed frontier models on the tasks developers actually care about: coding, debugging, long-context reasoning, and agentic workflows.
The 1M context window opens up use cases that were previously only practical with proprietary APIs. And the MIT license means there’s no legal friction to building real products.
If you’ve been holding off on self-hosting because the quality gap was too large — it’s time to take another look.
Try it: Hugging Face · Artificial Analysis
