Overview
Qwen (通义千问, Tongyi Qianwen) is a family of large language models developed by Alibaba Cloud's Qwen team. Alibaba launched Qwen in April 2023, opening it for public use in September 2023. The team has released over 100 open-weight models across text, vision, audio, coding, and mathematics domains. Most models are licensed under Apache 2.0, making them freely available for commercial and research use.
The current flagship is Qwen3.5, released February 2026, which introduces unified vision-language training with 201 language support. Qwen3.5 achieves cross-generational parity with both Qwen3 (text) and Qwen3-VL (vision) in a single architecture.
Qwen3.5 Series (February 2026)
Qwen3.5 represents a major architectural shift: unified vision-language foundation with early fusion training, Gated Delta Networks, and scalable RL across million-agent environments. All Qwen3.5 models support native multimodal (text, image, video) and 201 languages.
Flagship Models
| Model | Parameters | Active | Context | Description |
|---|
| Qwen3.5-Plus | 397B | 17B | 1M | Hosted API with built-in tools, adaptive tool use |
| Qwen3.5-397B-A17B | 397B | 17B | 262K-1M | Open-weight unified vision-language flagship |
Medium Models (February 24, 2026)
| Model | Parameters | Active | Context | Description |
|---|
| Qwen3.5-Flash | 35B | 3B | 1M | Hosted API, production workhorse |
| Qwen3.5-122B-A10B | 122B | 10B | 256K-1M | Long-horizon agentic tasks, server deployment |
| Qwen3.5-35B-A3B | 35B | 3B | 256K-1M | Outperforms 235B predecessor, consumer hardware |
| Qwen3.5-27B | 27B | 27B | 256K-1M | Dense model, best quantization tolerance |
The medium series demonstrates a generation shift: Qwen3.5-35B-A3B with only 3B active parameters outperforms both Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B across benchmarks. The 27B dense model achieves 72.4% on SWE-bench Verified, matching GPT-5-mini.
Qwen3.5 Highlights:
- Unified Vision-Language: Text, image, and video in one model
- 201 languages and dialects (up from 119)
- Gated Delta Networks + sparse MoE for efficiency
- Million-agent RL training for robust adaptability
- Near-100% multimodal training efficiency vs text-only
Text Models
Qwen3 Series (April 2025)
| Model | Parameters | Active | Context | Description |
|---|
| Qwen3-Max | 1T+ | - | 262K | Flagship proprietary model, LMArena top 3 |
| Qwen3-235B-A22B | 235B | 22B | 128K | Open-weight flagship MoE |
| Qwen3-30B-A3B | 30B | 3B | 128K | Efficient MoE, outperforms QwQ-32B |
| Qwen3-32B | 32B | 32B | 128K | Largest dense model |
| Qwen3-14B | 14B | 14B | 128K | Mid-size dense |
| Qwen3-8B | 8B | 8B | 128K | Balanced dense |
| Qwen3-4B | 4B | 4B | 32K | Rivals Qwen2.5-72B performance |
| Qwen3-1.7B | 1.7B | 1.7B | 32K | Edge deployment |
| Qwen3-0.6B | 0.6B | 0.6B | 32K | Smallest, mobile-friendly |
July 2025 Updates:
- Qwen3-235B-A22B-Instruct-2507: 256K context, enhanced capabilities
- Qwen3-235B-A22B-Thinking-2507: Improved reasoning depth
Qwen3-Next (September 2025)
| Model | Parameters | Active | Context | Description |
|---|
| Qwen3-Next-80B-A3B | 80B | 3B | 256K | Novel hybrid attention architecture |
Novel architecture with hybrid attention, highly sparse MoE, and multi-token prediction. Achieves Qwen3-32B performance at less than 10% training cost. Over 10x higher throughput at 32K+ context.
Qwen2.5 Series (September 2024)
| Model | Sizes | Context | Description |
|---|
| Qwen2.5 | 0.5B-72B | 128K | 7 sizes, strong base models |
| Qwen2.5-1M | 7B, 14B | 1M | Million-token context |
Reasoning Models
| Model | Parameters | Context | Description |
|---|
| QwQ-32B | 32B | 32K | Reasoning model similar to o1 |
| QvQ-72B-Preview | 72B | 32K | Visual reasoning research model |
Vision-Language Models
Qwen3-VL (September 2025)
| Model | Parameters | Active | Context | Description |
|---|
| Qwen3-VL-235B-A22B-Instruct | 235B | 22B | 256K-1M | Flagship vision-language, visual agent |
| Qwen3-VL-235B-A22B-Thinking | 235B | 22B | 256K-1M | Reasoning-enhanced multimodal |
Features: Visual agent (GUI operation), visual coding (mockup to code), 2D/3D grounding, OCR in 32 languages, long video understanding.
Qwen2.5-VL (January 2025)
| Model | Sizes | Context | Description |
|---|
| Qwen2.5-VL | 3B, 7B, 32B, 72B | 128K | Native resolution vision encoder |
Omni Models
| Model | Parameters | Description |
|---|
| Qwen3-Omni | - | Text, image, video, audio input; text and speech output |
| Qwen2.5-Omni-7B | 7B | End-to-end multimodal with Thinker-Talker architecture |
Coding Models
Qwen3-Coder (2025)
| Model | Parameters | Active | Context | Description |
|---|
| Qwen3-Coder-480B-A35B | 480B | 35B | 256K | Flagship coding model |
| Qwen3-Coder-30B-A3B | 30B | 3B | 256K | Efficient coding MoE |
| Qwen3-Coder-Next | 80B | 3B | 256K-1M | Agentic coding specialist |
Qwen2.5-Coder (September 2024)
| Model | Sizes | Context | Description |
|---|
| Qwen2.5-Coder | 0.5B-32B | 128K | 6 sizes, competitive with GPT-4o |
Specialized Models
| Model | Description |
|---|
| Qwen2.5-Math | Mathematical reasoning (1.5B, 7B, 72B) |
| Qwen-MT | Translation across 92 languages |
| Qwen-OCR | Specialized text extraction |
| Qwen3-Embedding | Text embeddings |
| Qwen3-Reranker | Search reranking |
| Qwen2-Audio | Audio understanding |
| Qwen3-ASR | Automatic speech recognition |
| Qwen3-TTS | Text-to-speech generation |
Key Capabilities
Qwen3.5 Highlights:
- Unified vision-language: Text, image, video in one architecture
- 201 languages and dialects
- Gated Delta Networks for efficient inference
- Million-agent RL training for robust real-world adaptability
- 1M context (API) / 256K-1M (open weights with YaRN)
- Medium models outperform previous 235B flagship
Qwen3 Highlights:
- Hybrid reasoning: Toggle thinking/non-thinking modes via
/think and /no_think
- 119 languages and dialects
- Native MCP (Model Context Protocol) support
- 36 trillion training tokens (2x Qwen2.5)
- Thinking budget control up to 38K tokens
Qwen3-VL Highlights:
- Visual agent: Operates PC/mobile GUIs
- Visual coding: Generates code from images/videos
- 2D and 3D spatial grounding
- Text-Timestamp Alignment for video temporal modeling
- DeepStack for multi-level ViT feature fusion
Links