Qwen

Overview

Qwen (通义千问, Tongyi Qianwen) is a family of large language models developed by Alibaba Cloud's Qwen team. Alibaba launched Qwen in April 2023, opening it for public use in September 2023. The team has released over 100 open-weight models across text, vision, audio, coding, and mathematics domains. Most models are licensed under Apache 2.0, making them freely available for commercial and research use.

The current flagship is Qwen3.5, released February 2026, which introduces unified vision-language training with 201 language support. Qwen3.5 achieves cross-generational parity with both Qwen3 (text) and Qwen3-VL (vision) in a single architecture.

Qwen3.5 Series (February 2026)

Qwen3.5 represents a major architectural shift: unified vision-language foundation with early fusion training, Gated Delta Networks, and scalable RL across million-agent environments. All Qwen3.5 models support native multimodal (text, image, video) and 201 languages.

Flagship Models

ModelParametersActiveContextDescription
Qwen3.5-Plus397B17B1MHosted API with built-in tools, adaptive tool use
Qwen3.5-397B-A17B397B17B262K-1MOpen-weight unified vision-language flagship

Medium Models (February 24, 2026)

ModelParametersActiveContextDescription
Qwen3.5-Flash35B3B1MHosted API, production workhorse
Qwen3.5-122B-A10B122B10B256K-1MLong-horizon agentic tasks, server deployment
Qwen3.5-35B-A3B35B3B256K-1MOutperforms 235B predecessor, consumer hardware
Qwen3.5-27B27B27B256K-1MDense model, best quantization tolerance

The medium series demonstrates a generation shift: Qwen3.5-35B-A3B with only 3B active parameters outperforms both Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B across benchmarks. The 27B dense model achieves 72.4% on SWE-bench Verified, matching GPT-5-mini.

Qwen3.5 Highlights:

  • Unified Vision-Language: Text, image, and video in one model
  • 201 languages and dialects (up from 119)
  • Gated Delta Networks + sparse MoE for efficiency
  • Million-agent RL training for robust adaptability
  • Near-100% multimodal training efficiency vs text-only

Text Models

Qwen3 Series (April 2025)

ModelParametersActiveContextDescription
Qwen3-Max1T+-262KFlagship proprietary model, LMArena top 3
Qwen3-235B-A22B235B22B128KOpen-weight flagship MoE
Qwen3-30B-A3B30B3B128KEfficient MoE, outperforms QwQ-32B
Qwen3-32B32B32B128KLargest dense model
Qwen3-14B14B14B128KMid-size dense
Qwen3-8B8B8B128KBalanced dense
Qwen3-4B4B4B32KRivals Qwen2.5-72B performance
Qwen3-1.7B1.7B1.7B32KEdge deployment
Qwen3-0.6B0.6B0.6B32KSmallest, mobile-friendly

July 2025 Updates:

  • Qwen3-235B-A22B-Instruct-2507: 256K context, enhanced capabilities
  • Qwen3-235B-A22B-Thinking-2507: Improved reasoning depth

Qwen3-Next (September 2025)

ModelParametersActiveContextDescription
Qwen3-Next-80B-A3B80B3B256KNovel hybrid attention architecture

Novel architecture with hybrid attention, highly sparse MoE, and multi-token prediction. Achieves Qwen3-32B performance at less than 10% training cost. Over 10x higher throughput at 32K+ context.

Qwen2.5 Series (September 2024)

ModelSizesContextDescription
Qwen2.50.5B-72B128K7 sizes, strong base models
Qwen2.5-1M7B, 14B1MMillion-token context

Reasoning Models

ModelParametersContextDescription
QwQ-32B32B32KReasoning model similar to o1
QvQ-72B-Preview72B32KVisual reasoning research model

Vision-Language Models

Qwen3-VL (September 2025)

ModelParametersActiveContextDescription
Qwen3-VL-235B-A22B-Instruct235B22B256K-1MFlagship vision-language, visual agent
Qwen3-VL-235B-A22B-Thinking235B22B256K-1MReasoning-enhanced multimodal

Features: Visual agent (GUI operation), visual coding (mockup to code), 2D/3D grounding, OCR in 32 languages, long video understanding.

Qwen2.5-VL (January 2025)

ModelSizesContextDescription
Qwen2.5-VL3B, 7B, 32B, 72B128KNative resolution vision encoder

Omni Models

ModelParametersDescription
Qwen3-Omni-Text, image, video, audio input; text and speech output
Qwen2.5-Omni-7B7BEnd-to-end multimodal with Thinker-Talker architecture

Coding Models

Qwen3-Coder (2025)

ModelParametersActiveContextDescription
Qwen3-Coder-480B-A35B480B35B256KFlagship coding model
Qwen3-Coder-30B-A3B30B3B256KEfficient coding MoE
Qwen3-Coder-Next80B3B256K-1MAgentic coding specialist

Qwen2.5-Coder (September 2024)

ModelSizesContextDescription
Qwen2.5-Coder0.5B-32B128K6 sizes, competitive with GPT-4o

Specialized Models

ModelDescription
Qwen2.5-MathMathematical reasoning (1.5B, 7B, 72B)
Qwen-MTTranslation across 92 languages
Qwen-OCRSpecialized text extraction
Qwen3-EmbeddingText embeddings
Qwen3-RerankerSearch reranking
Qwen2-AudioAudio understanding
Qwen3-ASRAutomatic speech recognition
Qwen3-TTSText-to-speech generation

Key Capabilities

Qwen3.5 Highlights:

  • Unified vision-language: Text, image, video in one architecture
  • 201 languages and dialects
  • Gated Delta Networks for efficient inference
  • Million-agent RL training for robust real-world adaptability
  • 1M context (API) / 256K-1M (open weights with YaRN)
  • Medium models outperform previous 235B flagship

Qwen3 Highlights:

  • Hybrid reasoning: Toggle thinking/non-thinking modes via /think and /no_think
  • 119 languages and dialects
  • Native MCP (Model Context Protocol) support
  • 36 trillion training tokens (2x Qwen2.5)
  • Thinking budget control up to 38K tokens

Qwen3-VL Highlights:

  • Visual agent: Operates PC/mobile GUIs
  • Visual coding: Generates code from images/videos
  • 2D and 3D spatial grounding
  • Text-Timestamp Alignment for video temporal modeling
  • DeepStack for multi-level ViT feature fusion

Links

Models in this family