GLM 4.7 Flash

GLM-4.7 Flash is Z.AI's lightweight coding model, offering strong performance in a 30B MoE architecture designed for local deployment on consumer hardware.

GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, it offers a new option for lightweight deployment that balances performance and efficiency. — Z.AI

Overview

Released January 2026, GLM-4.7 Flash is the "free-tier" version of GLM-4.7, optimized for coding, reasoning, and generative tasks with low latency and high throughput. With only 3B active parameters, it runs on consumer GPUs while maintaining competitive coding capabilities.

Key Capabilities

30B total / 3B active parameters (MoE architecture)
128K context window
Preserved Thinking mode for multi-turn agentic tasks
Consumer GPU compatible: RTX 3090/4090, Mac M-series
60-80+ tokens/second on appropriate hardware

Performance Characteristics

GLM-4.7 Flash leads the 30B class on coding benchmarks:

Strong SWE-bench performance for its size class
Competitive τ²-Bench scores with Preserved Thinking enabled
Multilingual coding support (not just Python)
Good terminal command understanding

When to Use GLM-4.7 Flash

Choose GLM-4.7 Flash when you need:

Local deployment without datacenter GPUs
Zero ongoing API costs
Fast inference for real-time coding assistance
Budget-conscious production workloads
Privacy-sensitive environments requiring on-device inference

Choose GLM-4.7 (full) when you need:

Maximum coding capability
Complex multi-step reasoning
Production workloads where accuracy trumps cost

Choose other lightweight models when you need:

Different capability profiles (e.g., Qwen3-30B-A3B for reasoning)
Specific fine-tuning requirements

Hardware Requirements

Configuration	GPU Memory	Performance
Minimum	24GB (RTX 3090/4090)	Functional
Recommended	48GB+ or multi-GPU	Optimal throughput
Mac	M-series with 32GB+	Good performance

Role in Series

GLM lightweight models:

GLM-4-9B (Apr 2025): 9B dense, translation-focused
GLM-4.5 Air (Jul 2025): 106B/12B active, balanced efficiency
GLM-4.7 Flash (Jan 2026): 30B/3B active, coding-optimized (this model)

Z.AI: GLM 4.7 Flash

Model Type