Model Overview
Model Features
Model Capabilities
Use Cases
đ DeepSeek-R1
DeepSeek-R1 is a first - generation reasoning model developed by DeepSeek - AI. It addresses the challenges faced by its predecessor, DeepSeek - R1 - Zero, and achieves comparable performance to OpenAI - o1 in math, code, and reasoning tasks. The model and its distilled versions are open - sourced to support the research community.
đ Quick Start
Before running DeepSeek - R1 series models locally, it's recommended to review the Usage Recommendations section. You can chat with DeepSeek - R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button "DeepThink". We also provide an OpenAI - Compatible API at DeepSeek Platform: platform.deepseek.com.
⨠Features
- Powerful Reasoning: DeepSeek - R1 - Zero, trained via large - scale reinforcement learning (RL) without supervised fine - tuning (SFT), demonstrated remarkable reasoning performance. DeepSeek - R1 further enhances this by incorporating cold - start data before RL.
- Open - Source: The project has open - sourced DeepSeek - R1 - Zero, DeepSeek - R1, and six dense models distilled from DeepSeek - R1 based on Llama and Qwen.
- High Performance: DeepSeek - R1 - Distill - Qwen - 32B outperforms OpenAI - o1 - mini across various benchmarks, achieving new state - of - the - art results for dense models.
đĻ Installation
DeepSeek - R1 Models
Please visit [DeepSeek - V3](https://github.com/deepseek - ai/DeepSeek - V3) repo for more information about running DeepSeek - R1 locally.
Note: Hugging Face's Transformers has not been directly supported yet.
DeepSeek - R1 - Distill Models
DeepSeek - R1 - Distill models can be utilized in the same manner as Qwen or Llama models.
For instance, you can easily start a service using [vLLM](https://github.com/vllm - project/vllm):
vllm serve deepseek - ai/DeepSeek - R1 - Distill - Qwen - 32B --tensor - parallel - size 2 --max - model - len 32768 --enforce - eager
You can also easily start a service using [SGLang](https://github.com/sgl - project/sglang):
python3 -m sglang.launch_server --model deepseek - ai/DeepSeek - R1 - Distill - Qwen - 32B --trust - remote - code --tp 2
đģ Usage Examples
Basic Usage
When using the DeepSeek - R1 series models, you can follow these steps:
- Set the temperature within the range of 0.5 - 0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
- Avoid adding a system prompt; all instructions should be contained within the user prompt.
- For mathematical problems, include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
Advanced Usage
When evaluating model performance, it is recommended to conduct multiple tests and average the results. To ensure that the model engages in thorough reasoning, enforce the model to initiate its response with "<think>\n" at the beginning of every output.
đ Documentation
Introduction
We introduce our first - generation reasoning models, DeepSeek - R1 - Zero and DeepSeek - R1. DeepSeek - R1 - Zero, trained via large - scale reinforcement learning (RL) without supervised fine - tuning (SFT), showed remarkable reasoning performance but faced issues like endless repetition, poor readability, and language mixing. DeepSeek - R1 addresses these issues by incorporating cold - start data before RL and achieves comparable performance to OpenAI - o1 across math, code, and reasoning tasks.
Model Summary
Post - Training: Large - Scale Reinforcement Learning on the Base Model
- We directly apply reinforcement learning (RL) to the base model without relying on supervised fine - tuning (SFT) as a preliminary step. This results in DeepSeek - R1 - Zero, which can explore chain - of - thought (CoT) for solving complex problems and has capabilities like self - verification, reflection, and generating long CoTs.
- Our pipeline to develop DeepSeek - R1 includes two RL stages for discovering better reasoning patterns and aligning with human preferences, and two SFT stages as the seed for the model's reasoning and non - reasoning capabilities.
Distillation: Smaller Models Can Be Powerful Too
- We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, leading to better performance compared to the reasoning patterns discovered through RL on small models.
- We fine - tuned several dense models using the reasoning data generated by DeepSeek - R1 and open - sourced distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series.
Model Downloads
DeepSeek - R1 Models
Model | #Total Params | #Activated Params | Context Length | Download |
---|---|---|---|---|
DeepSeek - R1 - Zero | 671B | 37B | 128K | [đ¤ HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Zero) |
DeepSeek - R1 | 671B | 37B | 128K | [đ¤ HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1) |
DeepSeek - R1 - Zero & DeepSeek - R1 are trained based on DeepSeek - V3 - Base. For more details regarding the model architecture, please refer to [DeepSeek - V3](https://github.com/deepseek - ai/DeepSeek - V3) repository.
DeepSeek - R1 - Distill Models
Model | Base Model | Download |
---|---|---|
DeepSeek - R1 - Distill - Qwen - 1.5B | [Qwen2.5 - Math - 1.5B](https://huggingface.co/Qwen/Qwen2.5 - Math - 1.5B) | [đ¤ HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 1.5B) |
DeepSeek - R1 - Distill - Qwen - 7B | [Qwen2.5 - Math - 7B](https://huggingface.co/Qwen/Qwen2.5 - Math - 7B) | [đ¤ HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 7B) |
DeepSeek - R1 - Distill - Llama - 8B | [Llama - 3.1 - 8B](https://huggingface.co/meta - llama/Llama - 3.1 - 8B) | [đ¤ HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Llama - 8B) |
DeepSeek - R1 - Distill - Qwen - 14B | [Qwen2.5 - 14B](https://huggingface.co/Qwen/Qwen2.5 - 14B) | [đ¤ HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 14B) |
DeepSeek - R1 - Distill - Qwen - 32B | [Qwen2.5 - 32B](https://huggingface.co/Qwen/Qwen2.5 - 32B) | [đ¤ HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 32B) |
DeepSeek - R1 - Distill - Llama - 70B | [Llama - 3.3 - 70B - Instruct](https://huggingface.co/meta - llama/Llama - 3.3 - 70B - Instruct) | [đ¤ HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Llama - 70B) |
DeepSeek - R1 - Distill models are fine - tuned based on open - source models, using samples generated by DeepSeek - R1. We slightly change their configs and tokenizers. Please use our setting to run these models.
Evaluation Results
DeepSeek - R1 - Evaluation
For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top - p value of $0.95$, and generate 64 responses per query to estimate pass@1.
Category | Benchmark (Metric) | Claude - 3.5 - Sonnet - 1022 | GPT - 4o 0513 | DeepSeek V3 | OpenAI o1 - mini | OpenAI o1 - 1217 | DeepSeek R1 |
---|---|---|---|---|---|---|---|
Architecture | - | - | MoE | - | - | MoE | |
# Activated Params | - | - | 37B | - | - | 37B | |
# Total Params | - | - | 671B | - | - | 671B | |
English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | 91.8 | 90.8 |
MMLU - Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | 92.9 | |
MMLU - Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | 84.0 | |
DROP (3 - shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | 92.2 | |
IF - Eval (Prompt Strict) | 86.5 | 84.3 | 86.1 | 84.8 | - | 83.3 | |
GPQA - Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | 75.7 | 71.5 | |
SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | 47.0 | 30.1 | |
FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | 82.5 | |
AlpacaEval2.0 (LC - winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | 87.6 | |
ArenaHard (GPT - 4 - 1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | 92.3 | |
Code | LiveCodeBench (Pass@1 - COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | 65.9 |
Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | 96.6 | 96.3 | |
Codeforces (Rating) | 717 | 759 | 1134 | 1820 | 2061 | 2029 | |
SWE Verified (Resolved) | 50.8 | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | |
Aider - Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | 61.7 | 53.3 | |
Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | 79.8 |
MATH - 500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | 97.3 | |
CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | 78.8 | |
Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | 92.8 |
C - Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | 91.8 | |
C - SimpleQA (Correct) | 55.4 | 58.7 | 68.0 | 40.3 | - | 63.7 |
Distilled Model Evaluation
Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH - 500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating |
---|---|---|---|---|---|---|
GPT - 4o - 0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 |
Claude - 3.5 - Sonnet - 1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 |
o1 - mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | 1820 |
QwQ - 32B - Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 |
DeepSeek - R1 - Distill - Qwen - 1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 |
DeepSeek - R1 - Distill - Qwen - 7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 |
DeepSeek - R1 - Distill - Qwen - 14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 |
DeepSeek - R1 - Distill - Qwen - 32B | 72.6 | 83.3 | 94.3 | 62.1 | 57.2 | 1691 |
DeepSeek - R1 - Distill - Llama - 8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 |
DeepSeek - R1 - Distill - Llama - 70B | 70.0 | 86.7 | 94.5 | 65.2 | 57.5 | 1633 |
Usage Recommendations
â ī¸ Important Note
We recommend adhering to the following configurations when utilizing the DeepSeek - R1 series models, including benchmarking, to achieve the expected performance:
- Set the temperature within the range of 0.5 - 0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
- Avoid adding a system prompt; all instructions should be contained within the user prompt.
- For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
- When evaluating model performance, it is recommended to conduct multiple tests and average the results.
Additionally, we have observed that the DeepSeek - R1 series models tend to bypass thinking pattern (i.e., outputting "<think>\n\n</think>") when responding to certain queries, which can adversely affect the model's performance.
đĄ Usage Tip
To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with "<think>\n" at the beginning of every output.
đ§ Technical Details
- Post - Training: We directly apply reinforcement learning (RL) to the base model without supervised fine - tuning (SFT) as a preliminary step, which allows the model to explore chain - of - thought (CoT) for solving complex problems.
- Distillation: We distill the reasoning patterns of larger models into smaller models, which leads to better performance compared to the reasoning patterns discovered through RL on small models.
đ License
This code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek - ai/DeepSeek - R1/blob/main/LICENSE). DeepSeek - R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:
- DeepSeek - R1 - Distill - Qwen - 1.5B, DeepSeek - R1 - Distill - Qwen - 7B, DeepSeek - R1 - Distill - Qwen - 14B and DeepSeek - R1 - Distill - Qwen - 32B are derived from Qwen - 2.5 series, which are originally licensed under [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5 - 1.5B/blob/main/LICENSE), and now finetuned with 800k samples curated with DeepSeek - R1.
- DeepSeek - R1 - Distill - Llama - 8B is derived from Llama3.1 - 8B - Base and is originally licensed under [llama3.1 license](https://huggingface.co/meta - llama/Llama - 3.1 - 8B/blob/main/LICENSE).
- DeepSeek - R1 - Distill - Llama - 70B is derived from Llama3.3 - 70B - Instruct and is originally licensed under [llama3.3 license](https://huggingface.co/meta - llama/Llama - 3.3 - 70B - Instruct/blob/main/LICENSE).
đ Citation
@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
title={DeepSeek - R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
author={DeepSeek - AI},
year={2025},
eprint={2501.12948},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.12948},
}
đ Contact
If you have any questions, please raise an issue or contact us at service@deepseek.com.

