Qwen: Qwen3 Next 80B A3B Instruct
Model Type
Open Weight Model
80B parameters
Recommended Use Cases
Try Qwen3 Next 80B A3B Instruct
Qwen3 Next 80B A3B Instruct is Alibaba's novel-architecture instruction model featuring hybrid attention and extreme MoE sparsity, optimized for fast, stable responses without thinking traces.
Qwen3-Next-80B-A3B-Base outperforms Qwen3-32B-Base on downstream tasks with 10% of the total training cost and with 10x inference throughput for context over 32K.
- Qwen Team
Overview
Qwen3-Next-80B-A3B-Instruct is the instruction-tuned variant of Alibaba's next-generation architecture, combining hybrid attention and extreme MoE sparsity for fast, direct responses. It targets production workloads requiring high throughput, RAG, tool use, and agentic workflows.
Key Features
- Novel architecture: Hybrid Transformer-Mamba design
- Extreme efficiency: 80B total, only 3.9B active per token
- 10x throughput: Compared to Qwen3-32B at long contexts
- No thinking traces: Direct responses without
<think>blocks - 256K context: Native long-context, extendable to 1M
Technical Specifications
| Specification | Value |
|---|---|
| Total Parameters | 80B |
| Active Parameters | 3.9B |
| Architecture | Hybrid attention + High-sparsity MoE |
| Layers | 48 |
| Hidden Dimension | 2048 |
| Experts | 512 (10 activated + 1 shared) |
| Context Length | 256K tokens (1M with YaRN) |
| Training Data | 15T tokens |
| Release Date | September 2025 |
When to Use Qwen3-Next-80B-A3B-Instruct
Choose this model when you need:
- Fast, direct responses at scale
- High throughput on long contexts
- RAG and retrieval workflows
- Tool calling and agentic tasks
- Production deployment with consistent output
Choose Thinking variant when you need:
- Complex reasoning tasks
- Visible thought processes
- Mathematical problem solving
- Maximum accuracy over speed
Availability
- Open Weights: Hugging Face (Qwen/Qwen3-Next-80B-A3B-Instruct)
- API: NVIDIA NIM, OpenRouter
- Local: SGLang, vLLM, Ollama, llama.cpp
Role in Series
Qwen3-Next models:
- Qwen3-Next-80B-A3B-Instruct: Fast, production-optimized (this model)
- Qwen3-Next-80B-A3B-Thinking: Deep reasoning
- Qwen3-Coder-Next: Coding-specialized variant