Model Overview
Model Features
Model Capabilities
Use Cases
đ DeepSeek-R1
DeepSeek-R1 is a first - generation reasoning model. It addresses the issues of its predecessor, DeepSeek - R1 - Zero, and achieves comparable performance to OpenAI - o1 in math, code, and reasoning tasks. The model and its distilled versions are open - sourced to support the research community.
đ Quick Start
You can chat with DeepSeek - R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button "DeepThink". We also provide an OpenAI - Compatible API at DeepSeek Platform: platform.deepseek.com.
⨠Features
- Reinforcement Learning - Based Training: Directly apply large - scale reinforcement learning to the base model without supervised fine - tuning as a preliminary step, enabling the model to explore chain - of - thought for solving complex problems.
- Model Distillation: Demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance.
- Good Performance: Achieve performance comparable to OpenAI - o1 across math, code, and reasoning tasks. Some distilled models outperform OpenAI - o1 - mini in various benchmarks.
đĻ Installation
DeepSeek - R1 Models
Please visit [DeepSeek - V3](https://github.com/deepseek - ai/DeepSeek - V3) repo for more information about running DeepSeek - R1 locally.
DeepSeek - R1 - Distill Models
DeepSeek - R1 - Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using [vLLM](https://github.com/vllm - project/vllm):
vllm serve deepseek - ai/DeepSeek - R1 - Distill - Qwen - 32B --tensor - parallel - size 2 --max - model - len 32768 --enforce - eager
â ī¸ Important Note
We recommend setting an appropriate temperature (between 0.5 and 0.7) when running these models, otherwise you may encounter issues with endless repetition or incoherent output.
đģ Usage Examples
Basic Usage
For running DeepSeek - R1 - Distill models, you can use the following command with vLLM:
vllm serve deepseek - ai/DeepSeek - R1 - Distill - Qwen - 32B --tensor - parallel - size 2 --max - model - len 32768 --enforce - eager
đ Documentation
Introduction
We introduce our first - generation reasoning models, DeepSeek - R1 - Zero and DeepSeek - R1. DeepSeek - R1 - Zero, trained via large - scale reinforcement learning without supervised fine - tuning, demonstrated remarkable reasoning performance but faced challenges like endless repetition. To address these issues and enhance performance, we introduce DeepSeek - R1, which incorporates cold - start data before RL. DeepSeek - R1 achieves performance comparable to OpenAI - o1 across math, code, and reasoning tasks. We have open - sourced DeepSeek - R1 - Zero, DeepSeek - R1, and six dense models distilled from DeepSeek - R1.
Model Summary
Post - Training: Large - Scale Reinforcement Learning on the Base Model
- We directly apply reinforcement learning to the base model without supervised fine - tuning. This allows the model to explore chain - of - thought, resulting in DeepSeek - R1 - Zero, which shows capabilities like self - verification and reflection. It is the first open research to validate that LLMs' reasoning capabilities can be incentivized purely through RL without SFT.
- Our pipeline to develop DeepSeek - R1 includes two RL stages for discovering better reasoning patterns and aligning with human preferences, as well as two SFT stages for the model's reasoning and non - reasoning capabilities.
Distillation: Smaller Models Can Be Powerful Too
- Demonstrate that the reasoning patterns of larger models can be distilled into smaller models, outperforming the reasoning patterns discovered through RL on small models. The open - source DeepSeek - R1 and its API will help the research community distill better smaller models.
- Fine - tune several dense models using the reasoning data generated by DeepSeek - R1. The evaluation results show that the distilled smaller dense models perform well on benchmarks. We open - source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series.
Model Downloads
DeepSeek - R1 Models
Model | #Total Params | #Activated Params | Context Length | Download |
---|---|---|---|---|
DeepSeek - R1 - Zero | 671B | 37B | 128K | [HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Zero) |
DeepSeek - R1 | 671B | 37B | 128K | [HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1) |
DeepSeek - R1 - Zero & DeepSeek - R1 are trained based on DeepSeek - V3 - Base. For more details regarding the model architecture, please refer to [DeepSeek - V3](https://github.com/deepseek - ai/DeepSeek - V3) repository.
DeepSeek - R1 - Distill Models
Model | Base Model | Download |
---|---|---|
DeepSeek - R1 - Distill - Qwen - 1.5B | [Qwen2.5 - Math - 1.5B](https://huggingface.co/Qwen/Qwen2.5 - Math - 1.5B) | [HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 1.5B) |
DeepSeek - R1 - Distill - Qwen - 7B | [Qwen2.5 - Math - 7B](https://huggingface.co/Qwen/Qwen2.5 - Math - 7B) | [HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 7B) |
DeepSeek - R1 - Distill - Llama - 8B | [Llama - 3.1 - 8B](https://huggingface.co/meta - llama/Llama - 3.1 - 8B) | [HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Llama - 8B) |
DeepSeek - R1 - Distill - Qwen - 14B | [Qwen2.5 - 14B](https://huggingface.co/Qwen/Qwen2.5 - 14B) | [HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 14B) |
DeepSeek - R1 - Distill - Qwen - 32B | [Qwen2.5 - 32B](https://huggingface.co/Qwen/Qwen2.5 - 32B) | [HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 32B) |
DeepSeek - R1 - Distill - Llama - 70B | [Llama - 3.3 - 70B - Instruct](https://huggingface.co/meta - llama/Llama - 3.3 - 70B - Instruct) | [HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Llama - 70B) |
DeepSeek - R1 - Distill models are fine - tuned based on open - source models, using samples generated by DeepSeek - R1. We slightly change their configs and tokenizers. Please use our setting to run these models.
Evaluation Results
DeepSeek - R1 - Evaluation
For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top - p value of $0.95$, and generate 64 responses per query to estimate pass@1.
Category | Benchmark (Metric) | Claude - 3.5 - Sonnet - 1022 | GPT - 4o 0513 | DeepSeek V3 | OpenAI o1 - mini | OpenAI o1 - 1217 | DeepSeek R1 |
---|---|---|---|---|---|---|---|
Architecture | - | - | MoE | - | - | MoE | |
# Activated Params | - | - | 37B | - | - | 37B | |
# Total Params | - | - | 671B | - | - | 671B | |
English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | 91.8 | 90.8 |
MMLU - Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | 92.9 | |
MMLU - Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | 84.0 | |
DROP (3 - shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | 92.2 | |
IF - Eval (Prompt Strict) | 86.5 | 84.3 | 86.1 | 84.8 | - | 83.3 | |
GPQA - Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | 75.7 | 71.5 | |
SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | 47.0 | 30.1 | |
FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | 82.5 | |
AlpacaEval2.0 (LC - winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | 87.6 | |
ArenaHard (GPT - 4 - 1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | 92.3 | |
Code | LiveCodeBench (Pass@1 - COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | 65.9 |
Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | 96.6 | 96.3 | |
Codeforces (Rating) | 717 | 759 | 1134 | 1820 | 2061 | 2029 | |
SWE Verified (Resolved) | 50.8 | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | |
Aider - Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | 61.7 | 53.3 | |
Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | 79.8 |
MATH - 500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | 97.3 | |
CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | 78.8 | |
Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | 92.8 |
C - Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | 91.8 | |
C - SimpleQA (Correct) | 55.4 | 58.7 | 68.0 | 40.3 | - | 63.7 |
Distilled Model Evaluation
Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH - 500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating |
---|---|---|---|---|---|---|
GPT - 4o - 0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 |
Claude - 3.5 - Sonnet - 1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 |
o1 - mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | 1820 |
QwQ - 32B - Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 |
DeepSeek - R1 - Distill - Qwen - 1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 |
DeepSeek - R1 - Distill - Qwen - 7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 |
DeepSeek - R1 - Distill - Qwen - 14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 |
DeepSeek - R1 - Distill - Qwen - 32B | 72.6 | 83.3 | 94.3 | 62.1 | 57.2 | 1691 |
DeepSeek - R1 - Distill - Llama - 8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 |
DeepSeek - R1 - Distill - Llama - 70B | 70.0 | 86.7 | 94.5 | 65.2 | 57.5 | 1633 |
đ§ Technical Details
- Reinforcement Learning: Directly apply reinforcement learning to the base model without supervised fine - tuning, enabling the model to explore chain - of - thought for solving complex problems.
- Model Distillation: Distill the reasoning patterns of larger models into smaller models, improving the performance of smaller models.
đ License
This code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek - ai/DeepSeek - R1/blob/main/LICENSE). DeepSeek - R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:
- DeepSeek - R1 - Distill - Qwen - 1.5B, DeepSeek - R1 - Distill - Qwen - 7B, DeepSeek - R1 - Distill - Qwen - 14B and DeepSeek - R1 - Distill - Qwen - 32B are derived from Qwen - 2.5 series, which are originally licensed under [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5 - 1.5B/blob/main/LICENSE), and now finetuned with 800k samples curated with DeepSeek - R1.
- DeepSeek - R1 - Distill - Llama - 8B is derived from Llama3.1 - 8B - Base and is originally licensed under [llama3.1 license](https://huggingface.co/meta - llama/Llama - 3.1 - 8B/blob/main/LICENSE).
- DeepSeek - R1 - Distill - Llama - 70B is derived from Llama3.3 - 70B - Instruct and is originally licensed under [llama3.3 license](https://huggingface.co/meta - llama/Llama - 3.3 - 70B - Instruct/blob/main/LICENSE).
đŦ Contact
If you have any questions, please raise an issue or contact us at service@deepseek.com.

