DeepSeek: DeepSeek V3.1 Base
Model Type
Proprietary Model
API access only
Recommended Use Cases
Text Generation
The pre-trained base model for the V3.1 series, built upon the original V3 checkpoint with extended long-context training (August 2025). V3.1-Base serves as the foundation for all V3.1 instruct and chat models.
Per DeepSeek:
DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach.
Role in V3.1 Series
V3.1-Base is the pre-trained foundation model without instruction tuning or RLHF. It's intended for researchers who want to build custom fine-tuned models or study the base capabilities before post-training.
Key Features
- Architecture: 685B MoE (671B parameters, 37B active)
- Context Window: 128K tokens
- Format: FP8 with UE8M0 scale data format
- License: MIT
Long-Context Training
Extended context capability through a two-phase approach:
- 32K Extension Phase: 630B tokens (10x increase from V3)
- 128K Extension Phase: 209B tokens (3.3x increase from V3)