DeepSeek V3.1 Base

The pre-trained base model for the V3.1 series, built upon the original V3 checkpoint with extended long-context training (August 2025). V3.1-Base serves as the foundation for all V3.1 instruct and chat models.

Per DeepSeek:

DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach.

Role in V3.1 Series

V3.1-Base is the pre-trained foundation model without instruction tuning or RLHF. It's intended for researchers who want to build custom fine-tuned models or study the base capabilities before post-training.

Key Features

Architecture: 685B MoE (671B parameters, 37B active)
Context Window: 128K tokens
Format: FP8 with UE8M0 scale data format
License: MIT

Long-Context Training

Extended context capability through a two-phase approach:

32K Extension Phase: 630B tokens (10x increase from V3)
128K Extension Phase: 209B tokens (3.3x increase from V3)

DeepSeek: DeepSeek V3.1 Base

Model Type

Recommended Use Cases

Role in V3.1 Series

Key Features

Long-Context Training

Links