DeepSeek-R1-Zero Open-Source Inference Model - Free Support for Math, Coding, and Reasoning Tasks!

Deepseek R1 Zero

Developed by deepseek-ai

DeepSeek-R1 is the first-generation reasoning model developed by DeepSeek, trained through reinforcement learning, excelling in mathematics, coding, and reasoning tasks.

Large Language Model

Transformers

Open Source License:MIT #Reinforcement Learning Reasoning #Mathematical Code Optimization #128K Long Context

Downloads 4,034

Release Time : 1/20/2025

Model Overview

DeepSeek-R1 is a large-scale reasoning model trained based on DeepSeek-V3-Base, optimized for reasoning capabilities via reinforcement learning, supporting a 128K context length.

Model Features

Reinforcement Learning Training

Directly trains the base model through large-scale reinforcement learning without requiring supervised fine-tuning as an initial step.

Emergent Reasoning Abilities

Naturally exhibits powerful reasoning behaviors such as self-verification, reflection, and long-chain reasoning.

High-Performance Reasoning

Performs comparably to OpenAI-o1 in mathematics, coding, and reasoning tasks.

Distillation Support

Supports distilling the reasoning patterns of large models into smaller models to enhance their performance.

Model Capabilities

Complex problem reasoning

Mathematical problem-solving

Code generation and understanding

Long-text processing

Multilingual support

Use Cases

Education

Mathematical Problem Solving

Helps students solve complex mathematical problems by providing detailed step-by-step solutions.

Excels in mathematical reasoning tasks

Programming

Code Generation and Optimization

Generates high-quality code based on requirements and can optimize existing code.

Achieves 65.9 Pass@1-COT on LiveCodeBench

Research

Complex Problem Analysis

Assists researchers in analyzing complex problems by providing multi-perspective insights.

Achieves 71.5 Pass@1 on GPQA-Diamond

🚀 DeepSeek-R1

DeepSeek-R1 is a first - generation reasoning model developed by DeepSeek - AI. It addresses the challenges faced by its predecessor, DeepSeek - R1 - Zero, and achieves comparable performance to OpenAI - o1 in math, code, and reasoning tasks. The model and its distilled versions are open - sourced to support the research community.

🚀 Quick Start

Before running DeepSeek - R1 series models locally, it's recommended to review the Usage Recommendations section. You can chat with DeepSeek - R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button "DeepThink". We also provide an OpenAI - Compatible API at DeepSeek Platform: platform.deepseek.com.

✨ Features

Powerful Reasoning: DeepSeek - R1 - Zero, trained via large - scale reinforcement learning (RL) without supervised fine - tuning (SFT), demonstrated remarkable reasoning performance. DeepSeek - R1 further enhances this by incorporating cold - start data before RL.
Open - Source: The project has open - sourced DeepSeek - R1 - Zero, DeepSeek - R1, and six dense models distilled from DeepSeek - R1 based on Llama and Qwen.
High Performance: DeepSeek - R1 - Distill - Qwen - 32B outperforms OpenAI - o1 - mini across various benchmarks, achieving new state - of - the - art results for dense models.

📦 Installation

DeepSeek - R1 Models

Please visit [DeepSeek - V3](https://github.com/deepseek - ai/DeepSeek - V3) repo for more information about running DeepSeek - R1 locally.

Note: Hugging Face's Transformers has not been directly supported yet.

DeepSeek - R1 - Distill Models

DeepSeek - R1 - Distill models can be utilized in the same manner as Qwen or Llama models.

For instance, you can easily start a service using [vLLM](https://github.com/vllm - project/vllm):

vllm serve deepseek - ai/DeepSeek - R1 - Distill - Qwen - 32B --tensor - parallel - size 2 --max - model - len 32768 --enforce - eager

You can also easily start a service using [SGLang](https://github.com/sgl - project/sglang):

python3 -m sglang.launch_server --model deepseek - ai/DeepSeek - R1 - Distill - Qwen - 32B --trust - remote - code --tp 2

💻 Usage Examples

Basic Usage

When using the DeepSeek - R1 series models, you can follow these steps:

Set the temperature within the range of 0.5 - 0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
Avoid adding a system prompt; all instructions should be contained within the user prompt.
For mathematical problems, include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."

Advanced Usage

When evaluating model performance, it is recommended to conduct multiple tests and average the results. To ensure that the model engages in thorough reasoning, enforce the model to initiate its response with "<think>\n" at the beginning of every output.

📚 Documentation

Introduction

We introduce our first - generation reasoning models, DeepSeek - R1 - Zero and DeepSeek - R1. DeepSeek - R1 - Zero, trained via large - scale reinforcement learning (RL) without supervised fine - tuning (SFT), showed remarkable reasoning performance but faced issues like endless repetition, poor readability, and language mixing. DeepSeek - R1 addresses these issues by incorporating cold - start data before RL and achieves comparable performance to OpenAI - o1 across math, code, and reasoning tasks.

Model Summary

Post - Training: Large - Scale Reinforcement Learning on the Base Model

We directly apply reinforcement learning (RL) to the base model without relying on supervised fine - tuning (SFT) as a preliminary step. This results in DeepSeek - R1 - Zero, which can explore chain - of - thought (CoT) for solving complex problems and has capabilities like self - verification, reflection, and generating long CoTs.
Our pipeline to develop DeepSeek - R1 includes two RL stages for discovering better reasoning patterns and aligning with human preferences, and two SFT stages as the seed for the model's reasoning and non - reasoning capabilities.

Distillation: Smaller Models Can Be Powerful Too

We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, leading to better performance compared to the reasoning patterns discovered through RL on small models.
We fine - tuned several dense models using the reasoning data generated by DeepSeek - R1 and open - sourced distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series.

Model Downloads

DeepSeek - R1 Models

Model	#Total Params	#Activated Params	Context Length	Download
DeepSeek - R1 - Zero	671B	37B	128K	[🤗 HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Zero)
DeepSeek - R1	671B	37B	128K	[🤗 HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1)

DeepSeek - R1 - Zero & DeepSeek - R1 are trained based on DeepSeek - V3 - Base. For more details regarding the model architecture, please refer to [DeepSeek - V3](https://github.com/deepseek - ai/DeepSeek - V3) repository.

DeepSeek - R1 - Distill Models

Model	Base Model	Download
DeepSeek - R1 - Distill - Qwen - 1.5B	[Qwen2.5 - Math - 1.5B](https://huggingface.co/Qwen/Qwen2.5 - Math - 1.5B)	[🤗 HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 1.5B)
DeepSeek - R1 - Distill - Qwen - 7B	[Qwen2.5 - Math - 7B](https://huggingface.co/Qwen/Qwen2.5 - Math - 7B)	[🤗 HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 7B)
DeepSeek - R1 - Distill - Llama - 8B	[Llama - 3.1 - 8B](https://huggingface.co/meta - llama/Llama - 3.1 - 8B)	[🤗 HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Llama - 8B)
DeepSeek - R1 - Distill - Qwen - 14B	[Qwen2.5 - 14B](https://huggingface.co/Qwen/Qwen2.5 - 14B)	[🤗 HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 14B)
DeepSeek - R1 - Distill - Qwen - 32B	[Qwen2.5 - 32B](https://huggingface.co/Qwen/Qwen2.5 - 32B)	[🤗 HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 32B)
DeepSeek - R1 - Distill - Llama - 70B	[Llama - 3.3 - 70B - Instruct](https://huggingface.co/meta - llama/Llama - 3.3 - 70B - Instruct)	[🤗 HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Llama - 70B)

DeepSeek - R1 - Distill models are fine - tuned based on open - source models, using samples generated by DeepSeek - R1. We slightly change their configs and tokenizers. Please use our setting to run these models.

Evaluation Results

DeepSeek - R1 - Evaluation

For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top - p value of $0.95$, and generate 64 responses per query to estimate pass@1.

Category	Benchmark (Metric)	Claude - 3.5 - Sonnet - 1022	GPT - 4o 0513	DeepSeek V3	OpenAI o1 - mini	OpenAI o1 - 1217	DeepSeek R1
	Architecture	-	-	MoE	-	-	MoE
	# Activated Params	-	-	37B	-	-	37B
	# Total Params	-	-	671B	-	-	671B
English	MMLU (Pass@1)	88.3	87.2	88.5	85.2	91.8	90.8
	MMLU - Redux (EM)	88.9	88.0	89.1	86.7	-	92.9
	MMLU - Pro (EM)	78.0	72.6	75.9	80.3	-	84.0
	DROP (3 - shot F1)	88.3	83.7	91.6	83.9	90.2	92.2
	IF - Eval (Prompt Strict)	86.5	84.3	86.1	84.8	-	83.3
	GPQA - Diamond (Pass@1)	65.0	49.9	59.1	60.0	75.7	71.5
	SimpleQA (Correct)	28.4	38.2	24.9	7.0	47.0	30.1
	FRAMES (Acc.)	72.5	80.5	73.3	76.9	-	82.5
	AlpacaEval2.0 (LC - winrate)	52.0	51.1	70.0	57.8	-	87.6
	ArenaHard (GPT - 4 - 1106)	85.2	80.4	85.5	92.0	-	92.3
Code	LiveCodeBench (Pass@1 - COT)	33.8	34.2	-	53.8	63.4	65.9
	Codeforces (Percentile)	20.3	23.6	58.7	93.4	96.6	96.3
	Codeforces (Rating)	717	759	1134	1820	2061	2029
	SWE Verified (Resolved)	50.8	38.8	42.0	41.6	48.9	49.2
	Aider - Polyglot (Acc.)	45.3	16.0	49.6	32.9	61.7	53.3
Math	AIME 2024 (Pass@1)	16.0	9.3	39.2	63.6	79.2	79.8
	MATH - 500 (Pass@1)	78.3	74.6	90.2	90.0	96.4	97.3
	CNMO 2024 (Pass@1)	13.1	10.8	43.2	67.6	-	78.8
Chinese	CLUEWSC (EM)	85.4	87.9	90.9	89.9	-	92.8
	C - Eval (EM)	76.7	76.0	86.5	68.9	-	91.8
	C - SimpleQA (Correct)	55.4	58.7	68.0	40.3	-	63.7

Distilled Model Evaluation

Model	AIME 2024 pass@1	AIME 2024 cons@64	MATH - 500 pass@1	GPQA Diamond pass@1	LiveCodeBench pass@1	CodeForces rating
GPT - 4o - 0513	9.3	13.4	74.6	49.9	32.9	759
Claude - 3.5 - Sonnet - 1022	16.0	26.7	78.3	65.0	38.9	717
o1 - mini	63.6	80.0	90.0	60.0	53.8	1820
QwQ - 32B - Preview	44.0	60.0	90.6	54.5	41.9	1316
DeepSeek - R1 - Distill - Qwen - 1.5B	28.9	52.7	83.9	33.8	16.9	954
DeepSeek - R1 - Distill - Qwen - 7B	55.5	83.3	92.8	49.1	37.6	1189
DeepSeek - R1 - Distill - Qwen - 14B	69.7	80.0	93.9	59.1	53.1	1481
DeepSeek - R1 - Distill - Qwen - 32B	72.6	83.3	94.3	62.1	57.2	1691
DeepSeek - R1 - Distill - Llama - 8B	50.4	80.0	89.1	49.0	39.6	1205
DeepSeek - R1 - Distill - Llama - 70B	70.0	86.7	94.5	65.2	57.5	1633

Usage Recommendations

⚠️ Important Note

We recommend adhering to the following configurations when utilizing the DeepSeek - R1 series models, including benchmarking, to achieve the expected performance:

Set the temperature within the range of 0.5 - 0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.

Avoid adding a system prompt; all instructions should be contained within the user prompt.

For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."

When evaluating model performance, it is recommended to conduct multiple tests and average the results.

Additionally, we have observed that the DeepSeek - R1 series models tend to bypass thinking pattern (i.e., outputting "<think>\n\n</think>") when responding to certain queries, which can adversely affect the model's performance.

💡 Usage Tip

To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with "<think>\n" at the beginning of every output.

🔧 Technical Details

Post - Training: We directly apply reinforcement learning (RL) to the base model without supervised fine - tuning (SFT) as a preliminary step, which allows the model to explore chain - of - thought (CoT) for solving complex problems.
Distillation: We distill the reasoning patterns of larger models into smaller models, which leads to better performance compared to the reasoning patterns discovered through RL on small models.

📄 License

This code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek - ai/DeepSeek - R1/blob/main/LICENSE). DeepSeek - R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:

DeepSeek - R1 - Distill - Qwen - 1.5B, DeepSeek - R1 - Distill - Qwen - 7B, DeepSeek - R1 - Distill - Qwen - 14B and DeepSeek - R1 - Distill - Qwen - 32B are derived from Qwen - 2.5 series, which are originally licensed under [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5 - 1.5B/blob/main/LICENSE), and now finetuned with 800k samples curated with DeepSeek - R1.
DeepSeek - R1 - Distill - Llama - 8B is derived from Llama3.1 - 8B - Base and is originally licensed under [llama3.1 license](https://huggingface.co/meta - llama/Llama - 3.1 - 8B/blob/main/LICENSE).
DeepSeek - R1 - Distill - Llama - 70B is derived from Llama3.3 - 70B - Instruct and is originally licensed under [llama3.3 license](https://huggingface.co/meta - llama/Llama - 3.3 - 70B - Instruct/blob/main/LICENSE).

📖 Citation

@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
      title={DeepSeek - R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning}, 
      author={DeepSeek - AI},
      year={2025},
      eprint={2501.12948},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.12948}, 
}

📞 Contact

If you have any questions, please raise an issue or contact us at service@deepseek.com.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご