Qwen: Qwen3 Next 80B A3B Thinking
Model Type
Open Weight Model
80B parameters
Recommended Use Cases
Try Qwen3 Next 80B A3B Thinking
Qwen3-Next-80B-A3B-Thinking is Alibaba's novel-architecture reasoning model featuring hybrid attention and extreme MoE sparsity, delivering 10x inference throughput over Qwen3-32B while outperforming larger reasoning models.
Qwen3-Next-80B-A3B-Thinking demonstrates outstanding performance on complex reasoning tasks, outperforming Qwen3-30B-A3B-Thinking-2507, Qwen3-32B-Thinking, and even Gemini-2.5-Flash-Thinking.
- Qwen Team
Overview
Qwen3-Next-80B-A3B-Thinking is the reasoning variant of Alibaba's next-generation architecture, combining hybrid attention (Gated DeltaNet + Gated Attention), extreme MoE sparsity (512 experts, 10 activated), and multi-token prediction for unprecedented efficiency on complex reasoning tasks.
Key Features
- Novel architecture: Hybrid Transformer-Mamba design
- Extreme efficiency: 80B total, only 3.9B active per token
- 10x throughput: Compared to Qwen3-32B at long contexts
- Extended thinking: Complex reasoning with visible
<think>blocks - 256K context: Native long-context support, extendable to 1M
Technical Specifications
| Specification | Value |
|---|---|
| Total Parameters | 80B |
| Active Parameters | 3.9B |
| Architecture | Hybrid attention + High-sparsity MoE |
| Layers | 48 |
| Experts | 512 (10 activated + 1 shared) |
| Context Length | 256K tokens (1M with YaRN) |
| Training Data | 15T tokens |
| Release Date | September 2025 |
When to Use Qwen3-Next-80B-A3B-Thinking
Choose this model when you need:
- Complex reasoning with efficient inference
- Long-context understanding (32K+ tokens)
- Mathematical and logical problem solving
- Research requiring visible reasoning traces
- High throughput on reasoning workloads
Choose Instruct variant when you need:
- Fast, direct responses
- Production workloads without thinking traces
- Tool calling and agentic tasks
Availability
- Open Weights: Hugging Face (Qwen/Qwen3-Next-80B-A3B-Thinking)
- API: NVIDIA NIM, OpenRouter
- Local: SGLang, vLLM, Ollama, llama.cpp
Role in Series
Qwen3-Next models:
- Qwen3-Next-80B-A3B-Instruct: Fast, no thinking traces
- Qwen3-Next-80B-A3B-Thinking: Deep reasoning (this model)
- Qwen3-Coder-Next: Coding-specialized variant