Z.AI
Z.AI (formerly Zhipu AI) is a Chinese AI company building the GLM family of foundation models, with a focus on agentic capabilities, coding, and reasoning. The company rebranded internationally as Z.AI in July 2025 and became the first major LLM company to go public via Hong Kong IPO in January 2026.
The first-principles approach to measuring AGI is to integrate more general intelligent capabilities without losing existing ones. GLM-4.5 is our first complete realization of this concept. — Zhang Peng, CEO of Z.AI
Company Background
Founded in 2019 as a spinoff from Tsinghua University, Z.AI has grown into one of China's "AI Tiger" companies. The company is backed by Alibaba, Tencent, Meituan, Ant Group, Xiaomi, and HongShan. OpenAI has identified Z.AI as one of the few global companies capable of building competitive models.
Z.AI was the first among Chinese AI companies to sign the Frontier AI Safety Commitments and is listed in Stanford's AI Index Report 2025 as developing "notable AI models."
Current Models
| Model | Released | Parameters | Context | Best For |
|---|---|---|---|---|
| GLM-5 | Feb 2026 | 744B / 40B active | 200K | Flagship agentic engineering |
| GLM-4.7 | Dec 2025 | 355B / 32B active | 200K | Production coding workflows |
| GLM-4.7 Flash | Jan 2026 | 30B / 3B active | 128K | Lightweight local deployment |
| GLM-4.6 | Sep 2025 | 357B / 32B active | 200K | Balanced coding and reasoning |
| GLM-4.5 | Jul 2025 | 355B / 32B active | 128K | Agent-native applications |
| GLM-4.5 Air | Jul 2025 | 106B / 12B active | 128K | Efficient agent tasks |
| GLM-4-32B | Apr 2025 | 32B (dense) | 128K | Cost-effective general use |
Model Selection Guide
For maximum capability:
- GLM-5: Best open-weight model for complex systems engineering and long-horizon agentic tasks
For production coding:
- GLM-4.7: Optimized for multi-step coding workflows with Claude Code, Cline, Roo Code
- GLM-4.7 Flash: Budget-friendly option for local deployment on consumer GPUs
For balanced performance:
- GLM-4.6: Strong coding with 200K context, first to run on Chinese domestic chips
- GLM-4.5: Native agent capabilities with thinking modes
For efficiency:
- GLM-4.5 Air: 106B parameters with only 12B active—strong performance at lower cost
- GLM-4-32B: Dense architecture for simpler deployment
Key Technologies
Mixture-of-Experts (MoE): Most GLM models use MoE architecture, activating only a fraction of total parameters per inference for efficiency.
Interleaved Thinking: Models think before every response and tool call, improving instruction following and generation quality.
Preserved Thinking: In coding scenarios, models retain thinking blocks across turns, avoiding repeated reasoning and information loss.
Turn-level Thinking Control: Enable or disable reasoning per turn to balance accuracy vs. latency.
DeepSeek Sparse Attention: Integrated in GLM-5 to reduce deployment costs while maintaining long-context performance.
Other Products
- AutoGLM: Smartphone AI agent using voice commands
- CogVideoX: Text-to-video generation
- CodeGeeX: Code assistant
- GLM-4.5V / GLM-4.6V: Vision-language models