Z.AI iconZ.AI: GLM 4.7 Flash

Model Type

Proprietary model icon

Proprietary Model

API access only

Recommended Use Cases

Text Generation

Try GLM 4.7 Flash

GLM-4.7 Flash is Z.AI's lightweight coding model, offering strong performance in a 30B MoE architecture designed for local deployment on consumer hardware.

GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, it offers a new option for lightweight deployment that balances performance and efficiency. — Z.AI

Overview

Released January 2026, GLM-4.7 Flash is the "free-tier" version of GLM-4.7, optimized for coding, reasoning, and generative tasks with low latency and high throughput. With only 3B active parameters, it runs on consumer GPUs while maintaining competitive coding capabilities.

Key Capabilities

  • 30B total / 3B active parameters (MoE architecture)
  • 128K context window
  • Preserved Thinking mode for multi-turn agentic tasks
  • Consumer GPU compatible: RTX 3090/4090, Mac M-series
  • 60-80+ tokens/second on appropriate hardware

Performance Characteristics

GLM-4.7 Flash leads the 30B class on coding benchmarks:

  • Strong SWE-bench performance for its size class
  • Competitive τ²-Bench scores with Preserved Thinking enabled
  • Multilingual coding support (not just Python)
  • Good terminal command understanding

When to Use GLM-4.7 Flash

Choose GLM-4.7 Flash when you need:

  • Local deployment without datacenter GPUs
  • Zero ongoing API costs
  • Fast inference for real-time coding assistance
  • Budget-conscious production workloads
  • Privacy-sensitive environments requiring on-device inference

Choose GLM-4.7 (full) when you need:

  • Maximum coding capability
  • Complex multi-step reasoning
  • Production workloads where accuracy trumps cost

Choose other lightweight models when you need:

  • Different capability profiles (e.g., Qwen3-30B-A3B for reasoning)
  • Specific fine-tuning requirements

Hardware Requirements

ConfigurationGPU MemoryPerformance
Minimum24GB (RTX 3090/4090)Functional
Recommended48GB+ or multi-GPUOptimal throughput
MacM-series with 32GB+Good performance

Role in Series

GLM lightweight models:

  1. GLM-4-9B (Apr 2025): 9B dense, translation-focused
  2. GLM-4.5 Air (Jul 2025): 106B/12B active, balanced efficiency
  3. GLM-4.7 Flash (Jan 2026): 30B/3B active, coding-optimized (this model)

Links