Z.AI: GLM 4.7 Flash
Model Type
Proprietary Model
API access only
Recommended Use Cases
Try GLM 4.7 Flash
GLM-4.7 Flash is Z.AI's lightweight coding model, offering strong performance in a 30B MoE architecture designed for local deployment on consumer hardware.
GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, it offers a new option for lightweight deployment that balances performance and efficiency. — Z.AI
Overview
Released January 2026, GLM-4.7 Flash is the "free-tier" version of GLM-4.7, optimized for coding, reasoning, and generative tasks with low latency and high throughput. With only 3B active parameters, it runs on consumer GPUs while maintaining competitive coding capabilities.
Key Capabilities
- 30B total / 3B active parameters (MoE architecture)
- 128K context window
- Preserved Thinking mode for multi-turn agentic tasks
- Consumer GPU compatible: RTX 3090/4090, Mac M-series
- 60-80+ tokens/second on appropriate hardware
Performance Characteristics
GLM-4.7 Flash leads the 30B class on coding benchmarks:
- Strong SWE-bench performance for its size class
- Competitive τ²-Bench scores with Preserved Thinking enabled
- Multilingual coding support (not just Python)
- Good terminal command understanding
When to Use GLM-4.7 Flash
Choose GLM-4.7 Flash when you need:
- Local deployment without datacenter GPUs
- Zero ongoing API costs
- Fast inference for real-time coding assistance
- Budget-conscious production workloads
- Privacy-sensitive environments requiring on-device inference
Choose GLM-4.7 (full) when you need:
- Maximum coding capability
- Complex multi-step reasoning
- Production workloads where accuracy trumps cost
Choose other lightweight models when you need:
- Different capability profiles (e.g., Qwen3-30B-A3B for reasoning)
- Specific fine-tuning requirements
Hardware Requirements
| Configuration | GPU Memory | Performance |
|---|---|---|
| Minimum | 24GB (RTX 3090/4090) | Functional |
| Recommended | 48GB+ or multi-GPU | Optimal throughput |
| Mac | M-series with 32GB+ | Good performance |
Role in Series
GLM lightweight models:
- GLM-4-9B (Apr 2025): 9B dense, translation-focused
- GLM-4.5 Air (Jul 2025): 106B/12B active, balanced efficiency
- GLM-4.7 Flash (Jan 2026): 30B/3B active, coding-optimized (this model)