DeepSeek-R1-Distill-Qwen-32B Model - Open source and free, with excellent reasoning ability to power efficient reasoning applications

Deepseek R1 Distill Qwen 32B Unsloth Bnb 4bit

Developed by unsloth

DeepSeek-R1 is the first-generation inference model launched by the DeepSeek team. Through large-scale reinforcement learning training, it does not require supervised fine-tuning (SFT) as an initial step and demonstrates excellent inference capabilities.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Efficient fine-tuning #Reinforcement learning inference #Mathematical code optimization

Downloads 938

Release Time : 1/22/2025

Model Overview

The DeepSeek-R1 series of models focuses on inference tasks, has the ability of self-verification, reflection, and generating long chains of thought (CoT), and is suitable for mathematics, code, and inference tasks.

Model Features

Fast fine-tuning

Unsloth helps fine-tune large language models, increasing the speed by 2 - 5 times and reducing memory usage by 70%.

Powerful inference ability

Its performance in mathematics, code, and inference tasks is comparable to that of OpenAI-o1, and some distilled models outperform OpenAI-o1-mini.

Dynamic quantization

1.58-bit + 2-bit dynamic quantization, after selective quantization, significantly improves accuracy compared to standard 1-bit/2-bit quantization.

Open-source distilled models

Six dense models distilled from DeepSeek-R1 based on Llama and Qwen are open-sourced, providing more options for the research community.

Model Capabilities

Mathematical problem solving

Code generation

Long text inference

Self-verification

Reflection ability

Generating long chains of thought (CoT)

Use Cases

Mathematical problem solving

Solving math competition problems in AIME 2024

DeepSeek-R1 achieves a pass@1 of 79.8% on the AIME 2024 competition problems, surpassing GPT-4o and Claude-3.5-Sonnet.

79.8% pass@1

Solving 500 math problems in MATH

On the MATH-500 dataset, DeepSeek-R1 achieves a pass@1 of 97.3%, showing excellent performance.

97.3% pass@1

Code generation

Code generation on LiveCodeBench

DeepSeek-R1 achieves a pass@1-COT of 65.9% on LiveCodeBench, better than GPT-4o and Claude-3.5-Sonnet.

65.9% pass@1-COT

Solving programming competition problems on Codeforces

DeepSeek-R1 scores 2029 on the Codeforces competition problems, approaching the 2061 of OpenAI o1-1217.

2029 score

Inference tasks

Multi-task language understanding on MMLU

DeepSeek-R1 achieves a pass@1 of 90.8% on the MMLU dataset, showing excellent performance.

90.8% pass@1

Reading comprehension on DROP

On the DROP dataset, DeepSeek-R1's 3-shot F1 reaches 92.2%, surpassing GPT-4o and Claude-3.5-Sonnet.

92.2% 3-shot F1

🚀 DeepSeek-R1

Finetune LLMs 2-5x faster with 70% less memory via Unsloth! Unsloth's DeepSeek-R1 offers high - performance reasoning models, and its distilled versions show excellent performance in various benchmarks.

🚀 Quick Start

We have a free Google Colab Tesla T4 notebook for Qwen2.5 (7B) here: Google Colab Notebook

See our collection for versions of DeepSeek - R1 including GGUF & 4 - bit formats.

Unsloth's DeepSeek - R1 1.58 - bit + 2 - bit Dynamic Quants is selectively quantized, greatly improving accuracy over standard 1 - bit/2 - bit.

✨ Features

Finetune for Free

All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.

Unsloth supports	Free Notebooks	Performance	Memory use
Llama - 3.2 (3B)	🚀 Start on Colab	2.4x faster	58% less
Llama - 3.2 (11B vision)	🚀 Start on Colab	2x faster	60% less
Qwen2 VL (7B)	🚀 Start on Colab	1.8x faster	60% less
Qwen2.5 (7B)	🚀 Start on Colab	2x faster	60% less
Llama - 3.1 (8B)	🚀 Start on Colab	2.4x faster	58% less
Phi - 3.5 (mini)	🚀 Start on Colab	2x faster	50% less
Gemma 2 (9B)	🚀 Start on Colab	2.4x faster	58% less
Mistral (7B)	🚀 Start on Colab	2.2x faster	62% less

This Llama 3.2 conversational notebook is useful for ShareGPT ChatML / Vicuna templates.
This text completion notebook is for raw text. This DPO notebook replicates Zephyr.
* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.

Special Thanks

A huge thank you to the DeepSeek team for creating and releasing these models.

Model Introduction

We introduce our first - generation reasoning models, DeepSeek - R1 - Zero and DeepSeek - R1. DeepSeek - R1 - Zero, a model trained via large - scale reinforcement learning (RL) without supervised fine - tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek - R1 - Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek - R1 - Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek - R1, which incorporates cold - start data before RL. DeepSeek - R1 achieves performance comparable to OpenAI - o1 across math, code, and reasoning tasks. To support the research community, we have open - sourced DeepSeek - R1 - Zero, DeepSeek - R1, and six dense models distilled from DeepSeek - R1 based on Llama and Qwen. DeepSeek - R1 - Distill - Qwen - 32B outperforms OpenAI - o1 - mini across various benchmarks, achieving new state - of - the - art results for dense models.

NOTE: Before running DeepSeek - R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.

📚 Documentation

Model Summary

Post - Training: Large - Scale Reinforcement Learning on the Base Model

We directly apply reinforcement learning (RL) to the base model without relying on supervised fine - tuning (SFT) as a preliminary step. This approach allows the model to explore chain - of - thought (CoT) for solving complex problems, resulting in the development of DeepSeek - R1 - Zero. DeepSeek - R1 - Zero demonstrates capabilities such as self - verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area.
We introduce our pipeline to develop DeepSeek - R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non - reasoning capabilities. We believe the pipeline will benefit the industry by creating better models.

Distillation: Smaller Models Can Be Powerful Too

We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek - R1, as well as its API, will benefit the research community to distill better smaller models in the future.
Using the reasoning data generated by DeepSeek - R1, we fine - tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open - source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.

Model Downloads

DeepSeek - R1 Models

Model	#Total Params	#Activated Params	Context Length	Download
DeepSeek - R1 - Zero	671B	37B	128K	🚀 HuggingFace
DeepSeek - R1	671B	37B	128K	🚀 HuggingFace

DeepSeek - R1 - Zero & DeepSeek - R1 are trained based on DeepSeek - V3 - Base. For more details regarding the model architecture, please refer to DeepSeek - V3 repository.

DeepSeek - R1 - Distill Models

Model	Base Model	Download
DeepSeek - R1 - Distill - Qwen - 1.5B	Qwen2.5 - Math - 1.5B	🚀 HuggingFace
DeepSeek - R1 - Distill - Qwen - 7B	Qwen2.5 - Math - 7B	🚀 HuggingFace
DeepSeek - R1 - Distill - Llama - 8B	Llama - 3.1 - 8B	🚀 HuggingFace
DeepSeek - R1 - Distill - Qwen - 14B	Qwen2.5 - 14B	🚀 HuggingFace
DeepSeek - R1 - Distill - Qwen - 32B	Qwen2.5 - 32B	🚀 HuggingFace
DeepSeek - R1 - Distill - Llama - 70B	Llama - 3.3 - 70B - Instruct	🚀 HuggingFace

DeepSeek - R1 - Distill models are fine - tuned based on open - source models, using samples generated by DeepSeek - R1. We slightly change their configs and tokenizers. Please use our setting to run these models.

Evaluation Results

DeepSeek - R1 - Evaluation

For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top - p value of $0.95$, and generate 64 responses per query to estimate pass@1.

Category	Benchmark (Metric)	Claude - 3.5 - Sonnet - 1022	GPT - 4o 0513	DeepSeek V3	OpenAI o1 - mini	OpenAI o1 - 1217	DeepSeek R1
	Architecture	-	-	MoE	-	-	MoE
	# Activated Params	-	-	37B	-	-	37B
	# Total Params	-	-	671B	-	-	671B
English	MMLU (Pass@1)	88.3	87.2	88.5	85.2	91.8	90.8
	MMLU - Redux (EM)	88.9	88.0	89.1	86.7	-	92.9
	MMLU - Pro (EM)	78.0	72.6	75.9	80.3	-	84.0
	DROP (3 - shot F1)	88.3	83.7	91.6	83.9	90.2	92.2
	IF - Eval (Prompt Strict)	86.5	84.3	86.1	84.8	-	83.3
	GPQA - Diamond (Pass@1)	65.0	49.9	59.1	60.0	75.7	71.5
	SimpleQA (Correct)	28.4	38.2	24.9	7.0	47.0	30.1
	FRAMES (Acc.)	72.5	80.5	73.3	76.9	-	82.5
	AlpacaEval2.0 (LC - winrate)	52.0	51.1	70.0	57.8	-	87.6
	ArenaHard (GPT - 4 - 1106)	85.2	80.4	85.5	92.0	-	92.3
Code	LiveCodeBench (Pass@1 - COT)	33.8	34.2	-	53.8	63.4	65.9
	Codeforces (Percentile)	20.3	23.6	58.7	93.4	96.6	96.3
	Codeforces (Rating)	717	759	1134	1820	2061	2029
	SWE Verified (Resolved)	50.8	38.8	42.0	41.6	48.9	49.2
	Aider - Polyglot (Acc.)	45.3	16.0	49.6	32.9	61.7	53.3
Math	AIME 2024 (Pass@1)	16.0	9.3	39.2	63.6	79.2	79.8
	MATH - 500 (Pass@1)	78.3	74.6	90.2	90.0	96.4	97.3
	CNMO 2024 (Pass@1)	13.1	10.8	43.2	67.6	-	78.8
Chinese	CLUEWSC (EM)	85.4	87.9	90.9	89.9	-	92.8
	C - Eval (EM)	76.7	76.0	86.5	68.9	-	91.8
	C - SimpleQA (Correct)	55.4	58.7	68.0	40.3	-	63.7

Distilled Model Evaluation

Model	AIME 2024 pass@1	AIME 2024 cons@64	MATH - 500 pass@1	GPQA Diamond pass@1	LiveCodeBench pass@1	CodeForces rating
GPT - 4o - 0513	9.3	13.4	74.6	49.9	32.9	759
Claude - 3.5 - Sonnet - 1022	16.0	26.7	78.3	65.0	38.9	717
o1 - mini	63.6	80.0	90.0	60.0	53.8	1820
QwQ - 32B - Preview	44.0	60.0	90.6	54.5	41.9	1316
DeepSeek - R1 - Distill - Qwen - 1.5B	28.9	52.7	83.9	33.8	16.9	954
DeepSeek - R1 - Distill - Qwen - 7B	55.5	83.3	92.8	49.1	37.6	1189
DeepSeek - R1 - Distill - Qwen - 14B	69.7	80.0	93.9	59.1	53.1	1481
DeepSeek - R1 - Distill - Qwen - 32B	72.6	83.3	94.3	62.1	57.2	1691
DeepSeek - R1 - Distill - Llama - 8B	50.4	80.0	89.1	49.0	39.6	1205
DeepSeek - R1 - Distill - Llama - 70B	70.0	86.7	94.5	65.2	57.5	1633

Chat Website & API Platform

You can chat with DeepSeek - R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button "DeepThink"

We also provide OpenAI - Compatible API at DeepSeek Platform: platform.deepseek.com

How to Run Locally

DeepSeek - R1 Models

Please visit DeepSeek - V3 repo for more information about running DeepSeek - R1 locally.

DeepSeek - R1 - Distill Models

DeepSeek - R1 - Distill models can be utilized in the same manner as Qwen or Llama models.

For instance, you can easily start a service using vLLM:

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager

You can also easily start a service using SGLang

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2

Usage Recommendations

⚠️ Important Note

We recommend adhering to the following configurations when utilizing the DeepSeek - R1 series models, including benchmarking, to achieve the expected performance:

Set the temperature within the range of 0.5 - 0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.

Avoid adding a system prompt; all instructions should be contained within the user prompt.

For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."

When evaluating model performance, it is recommended to conduct multiple tests and average the results.

📄 License

This code repository and the model weights are licensed under the MIT License. DeepSeek - R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:

DeepSeek - R1 - Distill - Qwen - 1.5B, DeepSeek - R1 - Distill - Qwen - 7B, DeepSeek - R1 - Distill - Qwen - 14B and DeepSeek - R1 - Distill - Qwen - 32B are derived from Qwen - 2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek - R1.
DeepSeek - R1 - Distill - Llama - 8B is derived from Llama3.1 - 8B - Base and is originally licensed under llama3.1 license.
DeepSeek - R1 - Distill - Llama - 70B is derived from Llama3.3 - 70B - Instruct and is originally licensed under llama3.3 license.

🔧 Technical Details

Citation

@misc{deepseekai2025deepseekr1incentivizingreasoningcapability

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご