DeepSeek-R1-0528-bf16開源模型 - 優化算法提升推理力，數學編程邏輯評估出色

首頁

Deepseek R1 0528 Bf16

由cognitivecomputations開發

DeepSeek-R1-0528是DeepSeek R1模型的小版本升級，通過增加計算資源和算法優化顯著提升了推理能力，在數學、編程和通用邏輯等多個基準評估中表現出色。

大型語言模型

Transformers

開源協議:MIT #數學推理增強 #編程能力優化 #低幻覺率

下載量 129

發布時間 : 5/30/2025

模型概述

DeepSeek-R1-0528是一個大型語言模型，專注於提升推理深度和能力，適用於數學、編程和通用邏輯任務。

模型特點

推理能力提升

通過增加計算資源和引入算法優化機制，顯著提升了推理深度和能力。

多領域表現優異

在數學、編程和通用邏輯等多個基準評估中表現出色，性能接近領先模型。

幻覺率降低

此版本降低了幻覺率，增強了對函數調用的支持，提供了更好的編碼體驗。

支持系統提示

新增支持系統提示功能，無需強制思維模式即可使用。

模型能力

複雜推理

數學問題解決

代碼生成與理解

多輪對話

文本生成

工具使用

使用案例

教育

數學競賽問題解答

解決AIME、HMMT等數學競賽題目

在AIME 2025測試中準確率達到87.5%

編程

代碼生成與調試

生成和優化編程代碼

在LiveCodeBench測試中Pass@1達到73.3%

研究

學術研究輔助

幫助研究人員進行復雜邏輯推理和問題分析

在GPQA-Diamond測試中Pass@1達到81.0%

🚀 DeepSeek-R1-0528

DeepSeek-R1-0528是DeepSeek R1模型的一個小版本升級。該模型通過增加計算資源和引入算法優化機制，顯著提升了推理深度和能力，在數學、編程和通用邏輯等多個基準評估中表現出色，整體性能已接近O3和Gemini 2.5 Pro等領先模型。

🚀 快速開始

你可以在DeepSeek的官方網站上與DeepSeek-R1進行對話：chat.deepseek.com，並開啟“DeepThink”按鈕。

我們還在DeepSeek平臺上提供與OpenAI兼容的API：platform.deepseek.com

如需瞭解如何在本地運行DeepSeek-R1-0528，請訪問DeepSeek-R1倉庫。

✨ 主要特性

推理能力顯著提升：在最新更新中，DeepSeek R1通過利用更多計算資源和引入算法優化機制，顯著提升了推理深度和推理能力。
多領域表現出色：該模型在數學、編程和通用邏輯等多個基準評估中表現出色，整體性能接近領先模型，如O3和Gemini 2.5 Pro。
幻覺率降低：此版本降低了幻覺率，增強了對函數調用的支持，並提供了更好的氛圍編碼體驗。

📚 詳細文檔

模型升級介紹

DeepSeek R1模型進行了小版本升級，當前版本為DeepSeek-R1-0528。在最新更新中，DeepSeek R1通過增加計算資源和在後期訓練中引入算法優化機制，顯著提升了推理深度和推理能力。該模型在數學、編程和通用邏輯等多個基準評估中表現出色，整體性能接近領先模型，如O3和Gemini 2.5 Pro。

與之前版本相比，升級後的模型在處理複雜推理任務方面有了顯著改進。例如，在2025年AIME測試中，模型的準確率從之前版本的70%提高到了當前版本的87.5%。這一進步源於推理過程中思維深度的增強：在AIME測試集中，之前的模型每個問題平均使用12K個標記，而新版本每個問題平均使用23K個標記。

評估結果

DeepSeek-R1-0528

對於所有模型，最大生成長度設置為64K個標記。對於需要採樣的基準測試，我們使用溫度值$0.6$、top-p值$0.95$，併為每個查詢生成16個響應以估計pass@1。

類別	基準測試（指標）	DeepSeek R1	DeepSeek R1 0528
通用	MMLU-Redux (EM)	92.9	93.4
通用	MMLU-Pro (EM)	84.0	85.0
通用	GPQA-Diamond (Pass@1)	71.5	81.0
通用	SimpleQA (Correct)	30.1	27.8
通用	FRAMES (Acc.)	82.5	83.0
通用	Humanity's Last Exam (Pass@1)	8.5	17.7
代碼	LiveCodeBench (2408 - 2505) (Pass@1)	63.5	73.3
代碼	Codeforces-Div1 (Rating)	1530	1930
代碼	SWE Verified (Resolved)	49.2	57.6
代碼	Aider-Polyglot (Acc.)	53.3	71.6
數學	AIME 2024 (Pass@1)	79.8	91.4
數學	AIME 2025 (Pass@1)	70.0	87.5
數學	HMMT 2025 (Pass@1)	41.7	79.4
數學	CNMO 2024 (Pass@1)	78.8	86.9
工具	BFCL_v3_MultiTurn (Acc)	-	37.0
工具	Tau-Bench (Pass@1)	-	53.5(Airline)/63.9(Retail)

注意：我們使用無代理框架評估模型在SWE-Verified上的性能。我們僅在HLE測試集中評估純文本提示。在Tau-bench評估中，使用GPT - 4.1扮演用戶角色。

DeepSeek-R1-0528-Qwen3-8B

同時，我們從DeepSeek-R1-0528中提取思維鏈對Qwen3 8B Base進行後期訓練，得到了DeepSeek-R1-0528-Qwen3-8B。該模型在2024年AIME測試中達到了開源模型中的最優性能，比Qwen3 8B提高了10.0%，與Qwen3 - 235B - thinking的性能相當。我們相信，DeepSeek-R1-0528的思維鏈對於推理模型的學術研究和小規模模型的工業發展都具有重要意義。

	AIME 24	AIME 25	HMMT Feb 25	GPQA Diamond	LiveCodeBench (2408 - 2505)
Qwen3 - 235B - A22B	85.7	81.5	62.5	71.1	66.5
Qwen3 - 32B	81.4	72.9	-	68.4	-
Qwen3 - 8B	76.0	67.3	-	62.0	-
Phi - 4 - Reasoning - Plus - 14B	81.3	78.0	53.6	69.3	-
Gemini - 2.5 - Flash - Thinking - 0520	82.3	72.0	64.2	82.8	62.3
o3 - mini (medium)	79.6	76.7	53.3	76.8	65.9
DeepSeek - R1 - 0528 - Qwen3 - 8B	86.0	76.3	61.5	61.1	60.5

本地運行說明

與之前版本的DeepSeek-R1相比，DeepSeek-R1-0528的使用建議有以下變化：

支持系統提示：現在支持系統提示。
無需強制思維模式：不需要在輸出開頭添加"<think>\n"來強制模型進入思維模式。

DeepSeek-R1-0528-Qwen3-8B的模型架構與Qwen3 8B相同，但它與DeepSeek-R1-0528共享相同的分詞器配置。該模型可以與Qwen3 8B以相同的方式運行。

系統提示

在DeepSeek官方網站/應用程序中，我們使用帶有特定日期的相同系統提示。

該助手為DeepSeek-R1，由深度求索公司創造。
今天是{current date}。

例如：

該助手為DeepSeek-R1，由深度求索公司創造。
今天是2025年5月28日，星期一。

溫度參數

在我們的網頁和應用程序環境中，溫度參數$T_{model}$設置為0.6。

文件上傳和網頁搜索提示

對於文件上傳，{file_name}、{file_content}和{question}是參數。

file_template = \
"""[file name]: {file_name}
[file content begin]
{file_content}
[file content end]
{question}"""

對於網頁搜索，{search_results}、{cur_date}和{question}是參數。對於中文查詢，我們使用以下提示：

search_answer_zh_template = \
'''# 以下內容是基於用戶發送的消息的搜索結果:
{search_results}
在我給你的搜索結果中，每個結果都是[webpage X begin]...[webpage X end]格式的，X代表每篇文章的數字索引。請在適當的情況下在句子末尾引用上下文。請按照引用編號[citation:X]的格式在答案中對應部分引用上下文。如果一句話源自多個上下文，請列出所有相關的引用編號，例如[citation:3][citation:5]，切記不要將引用集中在最後返回引用編號，而是在答案對應部分列出。
在回答時，請注意以下幾點：
- 今天是{cur_date}。
- 並非搜索結果的所有內容都與用戶的問題密切相關，你需要結合問題，對搜索結果進行甄別、篩選。
- 對於列舉類的問題（如列舉所有航班信息），儘量將答案控制在10個要點以內，並告訴用戶可以查看搜索來源、獲得完整信息。優先提供信息完整、最相關的列舉項；如非必要，不要主動告訴用戶搜索結果未提供的內容。
- 對於創作類的問題（如寫論文），請務必在正文的段落中引用對應的參考編號，例如[citation:3][citation:5]，不能只在文章末尾引用。你需要解讀並概括用戶的題目要求，選擇合適的格式，充分利用搜索結果並抽取重要信息，生成符合用戶要求、極具思想深度、富有創造力與專業性的答案。你的創作篇幅需要儘可能延長，對於每一個要點的論述要推測用戶的意圖，給出儘可能多角度的回答要點，且務必信息量大、論述詳盡。
- 如果回答很長，請儘量結構化、分段落總結。如果需要分點作答，儘量控制在5個點以內，併合並相關的內容。
- 對於客觀類的問答，如果問題的答案非常簡短，可以適當補充一到兩句相關信息，以豐富內容。
- 你需要根據用戶要求和回答內容選擇合適、美觀的回答格式，確保可讀性強。
- 你的回答應該綜合多個相關網頁來回答，不能重複引用一個網頁。
- 除非用戶要求，否則你回答的語言需要和用戶提問的語言保持一致。
# 用戶消息為：
{question}'''

對於英文查詢，我們使用以下提示：

search_answer_en_template = \
'''# The following contents are the search results related to the user's message:
{search_results}
In the search results I provide to you, each result is formatted as [webpage X begin]...[webpage X end], where X represents the numerical index of each article. Please cite the context at the end of the relevant sentence when appropriate. Use the citation format [citation:X] in the corresponding part of your answer. If a sentence is derived from multiple contexts, list all relevant citation numbers, such as [citation:3][citation:5]. Be sure not to cluster all citations at the end; instead, include them in the corresponding parts of the answer.
When responding, please keep the following points in mind:
- Today is {cur_date}.
- Not all content in the search results is closely related to the user's question. You need to evaluate and filter the search results based on the question.
- For listing-type questions (e.g., listing all flight information), try to limit the answer to 10 key points and inform the user that they can refer to the search sources for complete information. Prioritize providing the most complete and relevant items in the list. Avoid mentioning content not provided in the search results unless necessary.
- For creative tasks (e.g., writing an essay), ensure that references are cited within the body of the text, such as [citation:3][citation:5], rather than only at the end of the text. You need to interpret and summarize the user's requirements, choose an appropriate format, fully utilize the search results, extract key information, and generate an answer that is insightful, creative, and professional. Extend the length of your response as much as possible, addressing each point in detail and from multiple perspectives, ensuring the content is rich and thorough.
- If the response is lengthy, structure it well and summarize it in paragraphs. If a point-by-point format is needed, try to limit it to 5 points and merge related content.
- For objective Q&A, if the answer is very brief, you may add one or two related sentences to enrich the content.
- Choose an appropriate and visually appealing format for your response based on the user's requirements and the content of the answer, ensuring strong readability.
- Your answer should synthesize information from multiple relevant webpages and avoid repeatedly citing the same webpage.
- Unless the user requests otherwise, your response should be in the same language as the user's question.
# The user's message is:
{question}'''

📄 許可證

本代碼倉庫遵循MIT許可證。DeepSeek-R1模型的使用也遵循MIT許可證。DeepSeek-R1系列（包括基礎版和對話版）支持商業使用和蒸餾。

🔗 引用

@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
      title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning}, 
      author={DeepSeek-AI},
      year={2025},
      eprint={2501.12948},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.12948}, 
}