Try That LLM

DeepSeek's first-generation reasoning model, trained via large-scale reinforcement learning to achieve performance comparable to OpenAI o1 on math, code, and reasoning tasks (January 2025). R1 demonstrated that reasoning capabilities can emerge purely through RL without supervised fine-tuning.

Per DeepSeek:

DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT.

Key Features

Architecture: 685B MoE (671B total, 37B active per forward pass)
Base Model: DeepSeek-V3-Base
Training: Large-scale reinforcement learning with rule-based rewards
License: MIT (supports commercial use and distillation)

Benchmarks

AIME 2024: ~79.8% pass@1
MATH-500: ~97.3% pass@1
Codeforces: 2,029 Elo rating

Distilled Versions

DeepSeek open-sourced smaller distilled models based on Qwen and Llama:

DeepSeek-R1-Distill-Qwen: 1.5B, 7B, 14B, 32B
DeepSeek-R1-Distill-Llama: 8B, 70B

DeepSeek: R1

Model Type

Recommended Use Cases

Key Features

Benchmarks

Distilled Versions

Links