Z.AI iconZ.AI: GLM 4.6

Model Type

Proprietary model icon

Proprietary Model

API access only

Recommended Use Cases

Text Generation

Try GLM 4.6

GLM-4.6 is Z.AI's balanced flagship model, expanding context to 200K tokens with strong coding, reasoning, and agent capabilities. Notable as the first model to run on Chinese domestic chips.

GLM-4.6 achieves performance on par with Claude Sonnet 4 on several leaderboards, solidifying its position as the top model developed in China. — Z.AI

Overview

Released September 30, 2025, GLM-4.6 brings significant improvements over GLM-4.5: expanded context window (128K → 200K), better coding performance, advanced reasoning with tool use, and stronger agent capabilities. It marked the first integration of FP8 and Int4 quantization on Cambricon chips.

Key Capabilities

  • 200K context window (expanded from 128K)
  • 128K output tokens
  • Tool-integrated reasoning: Use tools during inference
  • 30%+ token efficiency vs GLM-4.5
  • Domestic chip support: Cambricon, Moore Threads, Huawei Ascend

Improvements Over GLM-4.5

CapabilityEnhancement
Context128K → 200K tokens
CodingHigher benchmark scores, better real-world performance
ReasoningClear improvement with tool-use support
AgentsStronger tool use and search-based agents
WritingBetter human preference alignment, natural role-play

When to Use GLM-4.6

Choose GLM-4.6 when you need:

  • 200K context for complex documents and codebases
  • Balanced coding and reasoning capabilities
  • Deployment on Chinese domestic hardware
  • Compatibility with Claude Code, Cline, Roo Code, Kilo Code
  • Cost-effective alternative to GLM-4.7

Choose GLM-4.7 when you need:

  • Enhanced Preserved Thinking for multi-turn stability
  • Better "vibe coding" with polished UI generation
  • Stronger multilingual coding support

Choose GLM-4.5 when you need:

  • Lower deployment costs
  • 128K context is sufficient
  • Established workflow compatibility

Role in Series

GLM context evolution:

  1. GLM-4.5 (Jul 2025): 128K context, first MoE architecture
  2. GLM-4.6 (Sep 2025): 200K context, domestic chip support (this model)
  3. GLM-4.7 (Dec 2025): 200K context, Preserved Thinking
  4. GLM-5 (Feb 2026): 200K context, 744B parameters

Links

GLM 4.6 | Try That LLM