Z.AI: GLM 4.6
Model Type
Proprietary Model
API access only
Recommended Use Cases
Try GLM 4.6
GLM-4.6 is Z.AI's balanced flagship model, expanding context to 200K tokens with strong coding, reasoning, and agent capabilities. Notable as the first model to run on Chinese domestic chips.
GLM-4.6 achieves performance on par with Claude Sonnet 4 on several leaderboards, solidifying its position as the top model developed in China. — Z.AI
Overview
Released September 30, 2025, GLM-4.6 brings significant improvements over GLM-4.5: expanded context window (128K → 200K), better coding performance, advanced reasoning with tool use, and stronger agent capabilities. It marked the first integration of FP8 and Int4 quantization on Cambricon chips.
Key Capabilities
- 200K context window (expanded from 128K)
- 128K output tokens
- Tool-integrated reasoning: Use tools during inference
- 30%+ token efficiency vs GLM-4.5
- Domestic chip support: Cambricon, Moore Threads, Huawei Ascend
Improvements Over GLM-4.5
| Capability | Enhancement |
|---|---|
| Context | 128K → 200K tokens |
| Coding | Higher benchmark scores, better real-world performance |
| Reasoning | Clear improvement with tool-use support |
| Agents | Stronger tool use and search-based agents |
| Writing | Better human preference alignment, natural role-play |
When to Use GLM-4.6
Choose GLM-4.6 when you need:
- 200K context for complex documents and codebases
- Balanced coding and reasoning capabilities
- Deployment on Chinese domestic hardware
- Compatibility with Claude Code, Cline, Roo Code, Kilo Code
- Cost-effective alternative to GLM-4.7
Choose GLM-4.7 when you need:
- Enhanced Preserved Thinking for multi-turn stability
- Better "vibe coding" with polished UI generation
- Stronger multilingual coding support
Choose GLM-4.5 when you need:
- Lower deployment costs
- 128K context is sufficient
- Established workflow compatibility
Role in Series
GLM context evolution:
- GLM-4.5 (Jul 2025): 128K context, first MoE architecture
- GLM-4.6 (Sep 2025): 200K context, domestic chip support (this model)
- GLM-4.7 (Dec 2025): 200K context, Preserved Thinking
- GLM-5 (Feb 2026): 200K context, 744B parameters