DeepSeek-R1 Open-Source Inference Model - Free to Use, Excels in Math, Coding, and Reasoning Tasks!

Deepseek R1

Developed by deepseek-ai

DeepSeek-R1 is the first-generation inference model launched by DeepSeek. Through large-scale reinforcement learning training, it performs excellently in mathematics, code, and reasoning tasks.

Large Language Model

Transformers

Open Source License:MIT #Mixture of Experts Architecture #128K Long Text Reasoning #Mathematics and Code Enhancement

Downloads 1.7M

Release Time : 1/20/2025

Model Overview

DeepSeek-R1 is a large-scale language model based on the MoE architecture, trained through two-stage reinforcement learning and supervised fine-tuning, focusing on improving complex reasoning abilities.

Model Features

Pure Reinforcement Learning Training

The DeepSeek-R1-Zero version is completely trained through reinforcement learning without supervised fine-tuning, demonstrating naturally emerging reasoning abilities.

Two-stage Training Process

It includes two RL stages for discovering reasoning patterns and aligning human preferences, as well as two SFT stages as ability seeds.

Powerful Reasoning Ability

It performs excellently in mathematics, code, and complex reasoning tasks, comparable to OpenAI-o1.

Knowledge Distillation Support

It supports distilling the reasoning ability of large models into small models to improve the performance of small models.

Model Capabilities

Solving Complex Mathematical Problems

Code Generation and Understanding

Long Text Reasoning

Multi-step Logical Reasoning

Self-verification and Reflection

Thought Chain Generation

Use Cases

Education

Mathematical Problem Solving

Solve complex mathematical problems, including proof questions and calculation questions.

It performs excellently in mathematical benchmark tests.

Programming

Code Generation

Generate functional code based on problem descriptions.

It achieves a Pass@1-COT of 65.9% on LiveCodeBench.

Research

Scientific Reasoning

Handle complex scientific problems and reasoning tasks.

It achieves an accuracy of 71.5% in the GPQA-Diamond test.

🚀 DeepSeek-R1

This project presents the first - generation reasoning models, DeepSeek - R1 - Zero and DeepSeek - R1. These models achieve remarkable performance in reasoning tasks and have been open - sourced to support the research community.

🚀 Quick Start

Before running DeepSeek - R1 series models locally, it's recommended to review the Usage Recommendation section. You can chat with DeepSeek - R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button "DeepThink". We also provide an OpenAI - Compatible API at DeepSeek Platform: platform.deepseek.com.

Paper Link👁️

✨ Features

Introduction

We introduce our first - generation reasoning models, DeepSeek - R1 - Zero and DeepSeek - R1. DeepSeek - R1 - Zero, trained via large - scale reinforcement learning (RL) without supervised fine - tuning (SFT) as a preliminary step, shows remarkable reasoning performance. With RL, it naturally emerged with numerous powerful and interesting reasoning behaviors. However, it has issues like endless repetition, poor readability, and language mixing. To address these and enhance reasoning performance, we introduce DeepSeek - R1, which incorporates cold - start data before RL. DeepSeek - R1 achieves performance comparable to OpenAI - o1 across math, code, and reasoning tasks. We have open - sourced DeepSeek - R1 - Zero, DeepSeek - R1, and six dense models distilled from DeepSeek - R1 based on Llama and Qwen. DeepSeek - R1 - Distill - Qwen - 32B outperforms OpenAI - o1 - mini across various benchmarks, achieving new state - of - the - art results for dense models.

NOTE: Before running DeepSeek - R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.

Model Summary

Post - Training: Large - Scale Reinforcement Learning on the Base Model

We directly apply reinforcement learning (RL) to the base model without relying on supervised fine - tuning (SFT) as a preliminary step. This allows the model to explore chain - of - thought (CoT) for solving complex problems, resulting in DeepSeek - R1 - Zero. It demonstrates capabilities such as self - verification, reflection, and generating long CoTs, and is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT.
We introduce our pipeline to develop DeepSeek - R1. It incorporates two RL stages for discovering improved reasoning patterns and aligning with human preferences, and two SFT stages as the seed for the model's reasoning and non - reasoning capabilities.

Distillation: Smaller Models Can Be Powerful Too

We show that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open - sourced DeepSeek - R1 and its API will benefit the research community to distill better smaller models in the future.
Using the reasoning data generated by DeepSeek - R1, we fine - tuned several dense models widely used in the research community. The evaluation results show that the distilled smaller dense models perform exceptionally well on benchmarks. We open - source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.

📦 Installation

Model Downloads

DeepSeek - R1 Models

Model	#Total Params	#Activated Params	Context Length	Download
DeepSeek - R1 - Zero	671B	37B	128K	🤗 HuggingFace
DeepSeek - R1	671B	37B	128K	🤗 HuggingFace

DeepSeek - R1 - Zero & DeepSeek - R1 are trained based on DeepSeek - V3 - Base. For more details regarding the model architecture, please refer to DeepSeek - V3 repository.

DeepSeek - R1 - Distill Models

Model	Base Model	Download
DeepSeek - R1 - Distill - Qwen - 1.5B	Qwen2.5 - Math - 1.5B	🤗 HuggingFace
DeepSeek - R1 - Distill - Qwen - 7B	Qwen2.5 - Math - 7B	🤗 HuggingFace
DeepSeek - R1 - Distill - Llama - 8B	Llama - 3.1 - 8B	🤗 HuggingFace
DeepSeek - R1 - Distill - Qwen - 14B	Qwen2.5 - 14B	🤗 HuggingFace
DeepSeek - R1 - Distill - Qwen - 32B	Qwen2.5 - 32B	🤗 HuggingFace
DeepSeek - R1 - Distill - Llama - 70B	Llama - 3.3 - 70B - Instruct	🤗 HuggingFace

DeepSeek - R1 - Distill models are fine - tuned based on open - source models, using samples generated by DeepSeek - R1. We slightly change their configs and tokenizers. Please use our setting to run these models.

📚 Documentation

Evaluation Results

DeepSeek - R1 - Evaluation

For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top - p value of $0.95$, and generate 64 responses per query to estimate pass@1.

Category	Benchmark (Metric)	Claude - 3.5 - Sonnet - 1022	GPT - 4o 0513	DeepSeek V3	OpenAI o1 - mini	OpenAI o1 - 1217	DeepSeek R1
	Architecture	-	-	MoE	-	-	MoE
	# Activated Params	-	-	37B	-	-	37B
	# Total Params	-	-	671B	-	-	671B
English	MMLU (Pass@1)	88.3	87.2	88.5	85.2	91.8	90.8
	MMLU - Redux (EM)	88.9	88.0	89.1	86.7	-	92.9
	MMLU - Pro (EM)	78.0	72.6	75.9	80.3	-	84.0
	DROP (3 - shot F1)	88.3	83.7	91.6	83.9	90.2	92.2
	IF - Eval (Prompt Strict)	86.5	84.3	86.1	84.8	-	83.3
	GPQA - Diamond (Pass@1)	65.0	49.9	59.1	60.0	75.7	71.5
	SimpleQA (Correct)	28.4	38.2	24.9	7.0	47.0	30.1
	FRAMES (Acc.)	72.5	80.5	73.3	76.9	-	82.5
	AlpacaEval2.0 (LC - winrate)	52.0	51.1	70.0	57.8	-	87.6
	ArenaHard (GPT - 4 - 1106)	85.2	80.4	85.5	92.0	-	92.3
Code	LiveCodeBench (Pass@1 - COT)	33.8	34.2	-	53.8	63.4	65.9
	Codeforces (Percentile)	20.3	23.6	58.7	93.4	96.6	96.3
	Codeforces (Rating)	717	759	1134	1820	2061	2029
	SWE Verified (Resolved)	50.8	38.8	42.0	41.6	48.9	49.2
	Aider - Polyglot (Acc.)	45.3	16.0	49.6	32.9	61.7	53.3
Math	AIME 2024 (Pass@1)	16.0	9.3	39.2	63.6	79.2	79.8
	MATH - 500 (Pass@1)	78.3	74.6	90.2	90.0	96.4	97.3
	CNMO 2024 (Pass@1)	13.1	10.8	43.2	67.6	-	78.8
Chinese	CLUEWSC (EM)	85.4	87.9	90.9	89.9	-	92.8
	C - Eval (EM)	76.7	76.0	86.5	68.9	-	91.8
	C - SimpleQA (Correct)	55.4	58.7	68.0	40.3	-	63.7

Distilled Model Evaluation

Model	AIME 2024 pass@1	AIME 2024 cons@64	MATH - 500 pass@1	GPQA Diamond pass@1	LiveCodeBench pass@1	CodeForces rating
GPT - 4o - 0513	9.3	13.4	74.6	49.9	32.9	759
Claude - 3.5 - Sonnet - 1022	16.0	26.7	78.3	65.0	38.9	717
o1 - mini	63.6	80.0	90.0	60.0	53.8	1820
QwQ - 32B - Preview	44.0	60.0	90.6	54.5	41.9	1316
DeepSeek - R1 - Distill - Qwen - 1.5B	28.9	52.7	83.9	33.8	16.9	954
DeepSeek - R1 - Distill - Qwen - 7B	55.5	83.3	92.8	49.1	37.6	1189
DeepSeek - R1 - Distill - Qwen - 14B	69.7	80.0	93.9	59.1	53.1	1481
DeepSeek - R1 - Distill - Qwen - 32B	72.6	83.3	94.3	62.1	57.2	1691
DeepSeek - R1 - Distill - Llama - 8B	50.4	80.0	89.1	49.0	39.6	1205
DeepSeek - R1 - Distill - Llama - 70B	70.0	86.7	94.5	65.2	57.5	1633

Chat Website & API Platform

You can chat with DeepSeek - R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button "DeepThink". We also provide an OpenAI - Compatible API at DeepSeek Platform: platform.deepseek.com.

How to Run Locally

DeepSeek - R1 Models

Please visit DeepSeek - V3 repo for more information about running DeepSeek - R1 locally.

NOTE: Hugging Face's Transformers has not been directly supported yet.

DeepSeek - R1 - Distill Models

DeepSeek - R1 - Distill models can be utilized in the same manner as Qwen or Llama models.

For instance, you can easily start a service using vLLM:

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager

You can also easily start a service using SGLang

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2

Usage Recommendations

We recommend adhering to the following configurations when utilizing the DeepSeek - R1 series models, including benchmarking, to achieve the expected performance:

Set the temperature within the range of 0.5 - 0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
Avoid adding a system prompt; all instructions should be contained within the user prompt.
For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
When evaluating model performance, it is recommended to conduct multiple tests and average the results.

Additionally, we have observed that the DeepSeek - R1 series models tend to bypass thinking pattern (i.e., outputting "<think>\n\n</think>") when responding to certain queries, which can adversely affect the model's performance. To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with "<think>\n" at the beginning of every output.

🔧 Technical Details

Post - Training: Large - Scale Reinforcement Learning on the Base Model

We directly apply reinforcement learning (RL) to the base model without relying on supervised fine - tuning (SFT) as a preliminary step. This approach allows the model to explore chain - of - thought (CoT) for solving complex problems, resulting in the development of DeepSeek - R1 - Zero. DeepSeek - R1 - Zero demonstrates capabilities such as self - verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area.
We introduce our pipeline to develop DeepSeek - R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non - reasoning capabilities. We believe the pipeline will benefit the industry by creating better models.

Distillation: Smaller Models Can Be Powerful Too

We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek - R1, as well as its API, will benefit the research community to distill better smaller models in the future.
Using the reasoning data generated by DeepSeek - R1, we fine - tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open - source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.

📄 License

This code repository and the model weights are licensed under the MIT License. DeepSeek - R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:

DeepSeek - R1 - Distill - Qwen - 1.5B, DeepSeek - R1 - Distill - Qwen - 7B, DeepSeek - R1 - Distill - Qwen - 14B and DeepSeek - R1 - Distill - Qwen - 32B are derived from Qwen - 2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek - R1.
DeepSeek - R1 - Distill - Llama - 8B is derived from Llama3.1 - 8B - Base and is originally licensed under llama3.1 license.
DeepSeek - R1 - Distill - Llama - 70B is derived from Llama3.3 - 70B - Instruct and is originally licensed under llama3.3 license.

📖 Citation

@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
      title={DeepSeek - R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning}, 
      author={DeepSeek - AI},
      year={2025},
      eprint={2501.12948},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.12948}, 
}

📞 Contact

If you have any questions, please raise an issue or contact us at service@deepseek.com.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご