DeepSeek-R1-bf16 Open Source Inference Model - Free Deployment, Mathematical and Code Inference Performance Comparable to OpenAI-o1

Deepseek R1 Bf16

Developed by opensourcerelease

DeepSeek-R1 is the first-generation inference model, which performs excellently in mathematics, code, and reasoning tasks, and its performance is comparable to that of OpenAI-o1.

Large Language Model

Transformers

Open Source License:MIT #Mathematical reasoning #Code generation #Reinforcement learning training

Downloads 1,486

Release Time : 1/21/2025

Model Overview

DeepSeek-R1 is a large language model focusing on mathematics, code, and reasoning tasks. It is trained through reinforcement learning and cold-start data, and has excellent reasoning ability and self-verification ability.

Model Features

Pure reinforcement learning training

Directly train the model through reinforcement learning without supervised fine-tuning (SFT) as an initial step.

Self-verification ability

The model has self-verification and reflection abilities and can generate long thought chains to solve complex problems.

Distillation support

Supports distilling the inference ability of large models into small models to improve the performance of small models.

128K long context

Supports a context length of up to 128K, suitable for processing long documents and complex tasks.

Model Capabilities

Mathematical reasoning

Code generation

Complex problem solving

Long text processing

Self-verification

Thought chain generation

Use Cases

Education

Mathematics problem solving

Solve high school mathematics competition questions

Achieved 79.8% pass@1 in the AIME 2024 test

Programming education

Generate programming exercises and solutions

Achieved 65.9% pass@1 in the LiveCodeBench test

Software development

Code generation

Generate functional code according to requirements

Achieved a score of 2029 in the Codeforces test

Code debugging

Analyze and fix errors in the code

Solved 49.2% of the problems in the SWE Verified test

Research

Scientific problem solving

Solve complex scientific problems

Achieved 71.5% pass@1 in the GPQA-Diamond test

🚀 DeepSeek-R1

DeepSeek-R1 is a first - generation reasoning model. It addresses the issues of its predecessor, DeepSeek - R1 - Zero, and achieves comparable performance to OpenAI - o1 in math, code, and reasoning tasks. The model and its distilled versions are open - sourced to support the research community.

🚀 Quick Start

You can chat with DeepSeek - R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button "DeepThink". We also provide an OpenAI - Compatible API at DeepSeek Platform: platform.deepseek.com.

✨ Features

Reinforcement Learning - Based Training: Directly apply large - scale reinforcement learning to the base model without supervised fine - tuning as a preliminary step, enabling the model to explore chain - of - thought for solving complex problems.
Model Distillation: Demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance.
Good Performance: Achieve performance comparable to OpenAI - o1 across math, code, and reasoning tasks. Some distilled models outperform OpenAI - o1 - mini in various benchmarks.

📦 Installation

DeepSeek - R1 Models

Please visit [DeepSeek - V3](https://github.com/deepseek - ai/DeepSeek - V3) repo for more information about running DeepSeek - R1 locally.

DeepSeek - R1 - Distill Models

DeepSeek - R1 - Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using [vLLM](https://github.com/vllm - project/vllm):

vllm serve deepseek - ai/DeepSeek - R1 - Distill - Qwen - 32B --tensor - parallel - size 2 --max - model - len 32768 --enforce - eager

⚠️ Important Note

We recommend setting an appropriate temperature (between 0.5 and 0.7) when running these models, otherwise you may encounter issues with endless repetition or incoherent output.

💻 Usage Examples

Basic Usage

For running DeepSeek - R1 - Distill models, you can use the following command with vLLM:

vllm serve deepseek - ai/DeepSeek - R1 - Distill - Qwen - 32B --tensor - parallel - size 2 --max - model - len 32768 --enforce - eager

📚 Documentation

Introduction

We introduce our first - generation reasoning models, DeepSeek - R1 - Zero and DeepSeek - R1. DeepSeek - R1 - Zero, trained via large - scale reinforcement learning without supervised fine - tuning, demonstrated remarkable reasoning performance but faced challenges like endless repetition. To address these issues and enhance performance, we introduce DeepSeek - R1, which incorporates cold - start data before RL. DeepSeek - R1 achieves performance comparable to OpenAI - o1 across math, code, and reasoning tasks. We have open - sourced DeepSeek - R1 - Zero, DeepSeek - R1, and six dense models distilled from DeepSeek - R1.

Model Summary

Post - Training: Large - Scale Reinforcement Learning on the Base Model

We directly apply reinforcement learning to the base model without supervised fine - tuning. This allows the model to explore chain - of - thought, resulting in DeepSeek - R1 - Zero, which shows capabilities like self - verification and reflection. It is the first open research to validate that LLMs' reasoning capabilities can be incentivized purely through RL without SFT.
Our pipeline to develop DeepSeek - R1 includes two RL stages for discovering better reasoning patterns and aligning with human preferences, as well as two SFT stages for the model's reasoning and non - reasoning capabilities.

Distillation: Smaller Models Can Be Powerful Too

Demonstrate that the reasoning patterns of larger models can be distilled into smaller models, outperforming the reasoning patterns discovered through RL on small models. The open - source DeepSeek - R1 and its API will help the research community distill better smaller models.
Fine - tune several dense models using the reasoning data generated by DeepSeek - R1. The evaluation results show that the distilled smaller dense models perform well on benchmarks. We open - source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series.

Model Downloads

DeepSeek - R1 Models

Model	#Total Params	#Activated Params	Context Length	Download
DeepSeek - R1 - Zero	671B	37B	128K	[HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Zero)
DeepSeek - R1	671B	37B	128K	[HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1)

DeepSeek - R1 - Zero & DeepSeek - R1 are trained based on DeepSeek - V3 - Base. For more details regarding the model architecture, please refer to [DeepSeek - V3](https://github.com/deepseek - ai/DeepSeek - V3) repository.

DeepSeek - R1 - Distill Models

Model	Base Model	Download
DeepSeek - R1 - Distill - Qwen - 1.5B	[Qwen2.5 - Math - 1.5B](https://huggingface.co/Qwen/Qwen2.5 - Math - 1.5B)	[HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 1.5B)
DeepSeek - R1 - Distill - Qwen - 7B	[Qwen2.5 - Math - 7B](https://huggingface.co/Qwen/Qwen2.5 - Math - 7B)	[HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 7B)
DeepSeek - R1 - Distill - Llama - 8B	[Llama - 3.1 - 8B](https://huggingface.co/meta - llama/Llama - 3.1 - 8B)	[HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Llama - 8B)
DeepSeek - R1 - Distill - Qwen - 14B	[Qwen2.5 - 14B](https://huggingface.co/Qwen/Qwen2.5 - 14B)	[HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 14B)
DeepSeek - R1 - Distill - Qwen - 32B	[Qwen2.5 - 32B](https://huggingface.co/Qwen/Qwen2.5 - 32B)	[HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 32B)
DeepSeek - R1 - Distill - Llama - 70B	[Llama - 3.3 - 70B - Instruct](https://huggingface.co/meta - llama/Llama - 3.3 - 70B - Instruct)	[HuggingFace](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Llama - 70B)

DeepSeek - R1 - Distill models are fine - tuned based on open - source models, using samples generated by DeepSeek - R1. We slightly change their configs and tokenizers. Please use our setting to run these models.

Evaluation Results

DeepSeek - R1 - Evaluation

For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top - p value of $0.95$, and generate 64 responses per query to estimate pass@1.

Category	Benchmark (Metric)	Claude - 3.5 - Sonnet - 1022	GPT - 4o 0513	DeepSeek V3	OpenAI o1 - mini	OpenAI o1 - 1217	DeepSeek R1
	Architecture	-	-	MoE	-	-	MoE
	# Activated Params	-	-	37B	-	-	37B
	# Total Params	-	-	671B	-	-	671B
English	MMLU (Pass@1)	88.3	87.2	88.5	85.2	91.8	90.8
	MMLU - Redux (EM)	88.9	88.0	89.1	86.7	-	92.9
	MMLU - Pro (EM)	78.0	72.6	75.9	80.3	-	84.0
	DROP (3 - shot F1)	88.3	83.7	91.6	83.9	90.2	92.2
	IF - Eval (Prompt Strict)	86.5	84.3	86.1	84.8	-	83.3
	GPQA - Diamond (Pass@1)	65.0	49.9	59.1	60.0	75.7	71.5
	SimpleQA (Correct)	28.4	38.2	24.9	7.0	47.0	30.1
	FRAMES (Acc.)	72.5	80.5	73.3	76.9	-	82.5
	AlpacaEval2.0 (LC - winrate)	52.0	51.1	70.0	57.8	-	87.6
	ArenaHard (GPT - 4 - 1106)	85.2	80.4	85.5	92.0	-	92.3
Code	LiveCodeBench (Pass@1 - COT)	33.8	34.2	-	53.8	63.4	65.9
	Codeforces (Percentile)	20.3	23.6	58.7	93.4	96.6	96.3
	Codeforces (Rating)	717	759	1134	1820	2061	2029
	SWE Verified (Resolved)	50.8	38.8	42.0	41.6	48.9	49.2
	Aider - Polyglot (Acc.)	45.3	16.0	49.6	32.9	61.7	53.3
Math	AIME 2024 (Pass@1)	16.0	9.3	39.2	63.6	79.2	79.8
	MATH - 500 (Pass@1)	78.3	74.6	90.2	90.0	96.4	97.3
	CNMO 2024 (Pass@1)	13.1	10.8	43.2	67.6	-	78.8
Chinese	CLUEWSC (EM)	85.4	87.9	90.9	89.9	-	92.8
	C - Eval (EM)	76.7	76.0	86.5	68.9	-	91.8
	C - SimpleQA (Correct)	55.4	58.7	68.0	40.3	-	63.7

Distilled Model Evaluation

Model	AIME 2024 pass@1	AIME 2024 cons@64	MATH - 500 pass@1	GPQA Diamond pass@1	LiveCodeBench pass@1	CodeForces rating
GPT - 4o - 0513	9.3	13.4	74.6	49.9	32.9	759
Claude - 3.5 - Sonnet - 1022	16.0	26.7	78.3	65.0	38.9	717
o1 - mini	63.6	80.0	90.0	60.0	53.8	1820
QwQ - 32B - Preview	44.0	60.0	90.6	54.5	41.9	1316
DeepSeek - R1 - Distill - Qwen - 1.5B	28.9	52.7	83.9	33.8	16.9	954
DeepSeek - R1 - Distill - Qwen - 7B	55.5	83.3	92.8	49.1	37.6	1189
DeepSeek - R1 - Distill - Qwen - 14B	69.7	80.0	93.9	59.1	53.1	1481
DeepSeek - R1 - Distill - Qwen - 32B	72.6	83.3	94.3	62.1	57.2	1691
DeepSeek - R1 - Distill - Llama - 8B	50.4	80.0	89.1	49.0	39.6	1205
DeepSeek - R1 - Distill - Llama - 70B	70.0	86.7	94.5	65.2	57.5	1633

🔧 Technical Details

Reinforcement Learning: Directly apply reinforcement learning to the base model without supervised fine - tuning, enabling the model to explore chain - of - thought for solving complex problems.
Model Distillation: Distill the reasoning patterns of larger models into smaller models, improving the performance of smaller models.

📄 License

This code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek - ai/DeepSeek - R1/blob/main/LICENSE). DeepSeek - R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:

DeepSeek - R1 - Distill - Qwen - 1.5B, DeepSeek - R1 - Distill - Qwen - 7B, DeepSeek - R1 - Distill - Qwen - 14B and DeepSeek - R1 - Distill - Qwen - 32B are derived from Qwen - 2.5 series, which are originally licensed under [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5 - 1.5B/blob/main/LICENSE), and now finetuned with 800k samples curated with DeepSeek - R1.
DeepSeek - R1 - Distill - Llama - 8B is derived from Llama3.1 - 8B - Base and is originally licensed under [llama3.1 license](https://huggingface.co/meta - llama/Llama - 3.1 - 8B/blob/main/LICENSE).
DeepSeek - R1 - Distill - Llama - 70B is derived from Llama3.3 - 70B - Instruct and is originally licensed under [llama3.3 license](https://huggingface.co/meta - llama/Llama - 3.3 - 70B - Instruct/blob/main/LICENSE).

📬 Contact

If you have any questions, please raise an issue or contact us at service@deepseek.com.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご