Qwen iconQwen: Qwen3 Next 80B A3B Thinking

Model Type

Open weight model icon

Open Weight Model

80B parameters

Recommended Use Cases

Text Generation

Try Qwen3 Next 80B A3B Thinking

Qwen3-Next-80B-A3B-Thinking is Alibaba's novel-architecture reasoning model featuring hybrid attention and extreme MoE sparsity, delivering 10x inference throughput over Qwen3-32B while outperforming larger reasoning models.

Qwen3-Next-80B-A3B-Thinking demonstrates outstanding performance on complex reasoning tasks, outperforming Qwen3-30B-A3B-Thinking-2507, Qwen3-32B-Thinking, and even Gemini-2.5-Flash-Thinking.

  • Qwen Team

Overview

Qwen3-Next-80B-A3B-Thinking is the reasoning variant of Alibaba's next-generation architecture, combining hybrid attention (Gated DeltaNet + Gated Attention), extreme MoE sparsity (512 experts, 10 activated), and multi-token prediction for unprecedented efficiency on complex reasoning tasks.

Key Features

  • Novel architecture: Hybrid Transformer-Mamba design
  • Extreme efficiency: 80B total, only 3.9B active per token
  • 10x throughput: Compared to Qwen3-32B at long contexts
  • Extended thinking: Complex reasoning with visible <think> blocks
  • 256K context: Native long-context support, extendable to 1M

Technical Specifications

SpecificationValue
Total Parameters80B
Active Parameters3.9B
ArchitectureHybrid attention + High-sparsity MoE
Layers48
Experts512 (10 activated + 1 shared)
Context Length256K tokens (1M with YaRN)
Training Data15T tokens
Release DateSeptember 2025

When to Use Qwen3-Next-80B-A3B-Thinking

Choose this model when you need:

  • Complex reasoning with efficient inference
  • Long-context understanding (32K+ tokens)
  • Mathematical and logical problem solving
  • Research requiring visible reasoning traces
  • High throughput on reasoning workloads

Choose Instruct variant when you need:

  • Fast, direct responses
  • Production workloads without thinking traces
  • Tool calling and agentic tasks

Availability

  • Open Weights: Hugging Face (Qwen/Qwen3-Next-80B-A3B-Thinking)
  • API: NVIDIA NIM, OpenRouter
  • Local: SGLang, vLLM, Ollama, llama.cpp

Role in Series

Qwen3-Next models:

  1. Qwen3-Next-80B-A3B-Instruct: Fast, no thinking traces
  2. Qwen3-Next-80B-A3B-Thinking: Deep reasoning (this model)
  3. Qwen3-Coder-Next: Coding-specialized variant

Links