Qwen3 Next 80B A3B Thinking

Qwen3-Next-80B-A3B-Thinking is Alibaba's novel-architecture reasoning model featuring hybrid attention and extreme MoE sparsity, delivering 10x inference throughput over Qwen3-32B while outperforming larger reasoning models.

Qwen3-Next-80B-A3B-Thinking demonstrates outstanding performance on complex reasoning tasks, outperforming Qwen3-30B-A3B-Thinking-2507, Qwen3-32B-Thinking, and even Gemini-2.5-Flash-Thinking.

Qwen Team

Overview

Qwen3-Next-80B-A3B-Thinking is the reasoning variant of Alibaba's next-generation architecture, combining hybrid attention (Gated DeltaNet + Gated Attention), extreme MoE sparsity (512 experts, 10 activated), and multi-token prediction for unprecedented efficiency on complex reasoning tasks.

Key Features

Novel architecture: Hybrid Transformer-Mamba design
Extreme efficiency: 80B total, only 3.9B active per token
10x throughput: Compared to Qwen3-32B at long contexts
Extended thinking: Complex reasoning with visible <think> blocks
256K context: Native long-context support, extendable to 1M

Technical Specifications

Specification	Value
Total Parameters	80B
Active Parameters	3.9B
Architecture	Hybrid attention + High-sparsity MoE
Layers	48
Experts	512 (10 activated + 1 shared)
Context Length	256K tokens (1M with YaRN)
Training Data	15T tokens
Release Date	September 2025

When to Use Qwen3-Next-80B-A3B-Thinking

Choose this model when you need:

Complex reasoning with efficient inference
Long-context understanding (32K+ tokens)
Mathematical and logical problem solving
Research requiring visible reasoning traces
High throughput on reasoning workloads

Choose Instruct variant when you need:

Fast, direct responses
Production workloads without thinking traces
Tool calling and agentic tasks

Availability

Open Weights: Hugging Face (Qwen/Qwen3-Next-80B-A3B-Thinking)
API: NVIDIA NIM, OpenRouter
Local: SGLang, vLLM, Ollama, llama.cpp

Role in Series

Qwen3-Next models:

Qwen3-Next-80B-A3B-Instruct: Fast, no thinking traces
Qwen3-Next-80B-A3B-Thinking: Deep reasoning (this model)
Qwen3-Coder-Next: Coding-specialized variant

Qwen: Qwen3 Next 80B A3B Thinking

Model Type