DeepSeek-V3-0324-BF16開源大語言模型 - 支持不兼容FP8的GPU量化推理

首頁

Deepseek V3 0324 BF16

由ModelCloud開發

DeepSeek-V3-0324是DeepSeek AI推出的BF16版本大語言模型，適用於不支持FP8的GPU進行量化和推理。

大型語言模型

Transformers

開源協議:MIT #BF16量化推理 #中文寫作增強 #前端代碼生成

下載量 397

發布時間 : 3/24/2025

模型概述

該模型是DeepSeek-V3的改進版本，在推理能力、前端開發、中文寫作、中文搜索和函數調用等方面有顯著提升。

模型特點

推理能力提升

在MMLU-Pro、GPQA、AIME和LiveCodeBench等基準測試中表現顯著提升。

前端開發能力

提升代碼可執行性，生成更具美感的網頁和遊戲前端界面。

中文寫作優化

優化風格與內容質量，符合R1寫作風格標準，中長文寫作質量提升。

中文搜索增強

增強報告類分析請求，輸出更詳實。

函數調用優化

提升函數調用準確率，修復V3前期版本問題。

模型能力

文本生成

代碼生成

問答系統

前端開發

中文寫作

中文搜索

函數調用

使用案例

內容創作

中文寫作

用於生成符合R1寫作風格標準的中長文內容

提升寫作質量和風格一致性

多輪交互式改寫

支持多輪交互式文本改寫

優化翻譯質量與信件撰寫

開發輔助

前端開發

生成可執行的前端代碼

提升代碼可執行性和界面美觀度

信息檢索

中文搜索

處理報告類分析請求

輸出更詳實的分析結果

🚀 DeepSeek-V3-0324

這是DeepSeek V3-0324的BF16模型，適用於不支持FP8（Nvidia Ampere）的GPU進行量化和推理。BF16是對DeepSeek AI的FP8量化權重進行反量化的結果：https://huggingface.co/deepseek-ai/DeepSeek-V3-0324 。GPTQModel 是你進行DeepSeek V3-0324量化的首選工具包，可用於vLLM和SGLang的推理。

✨ 主要特性

DeepSeek-V3-0324在多個關鍵方面相較於其前身DeepSeek-V3有顯著改進。

Model Performance

推理能力

基準測試性能顯著提升：
- MMLU-Pro：從75.9提升至81.2（+5.3）
- GPQA：從59.1提升至68.4（+9.3）
- AIME：從39.6提升至59.4（+19.8）
- LiveCodeBench：從39.2提升至49.2（+10.0）

前端網頁開發

提高了代碼的可執行性
生成的網頁和遊戲前端更美觀

中文寫作能力

提升了風格和內容質量：
- 符合R1寫作風格
- 中長篇寫作質量更高
功能增強：
- 改進了多輪交互式重寫
- 優化了翻譯質量和書信寫作

中文搜索能力

增強了報告分析請求，輸出更詳細

函數調用改進

提高了函數調用的準確性，修復了之前V3版本的問題

💡 使用建議

系統提示

在官方的DeepSeek網頁/應用中，我們使用帶有特定日期的相同系統提示。

該助手為DeepSeek Chat，由深度求索公司創造。
今天是{current date}。

例如：

該助手為DeepSeek Chat，由深度求索公司創造。
今天是3月24日，星期一。

溫度參數

在我們的網頁和應用環境中，模型溫度參數 $T_{model}$ 設置為0.3。由於許多用戶在API調用中使用默認溫度1.0，我們實現了一個API溫度 $T_{api}$ 映射機制，將輸入的API溫度值1.0調整為最合適的模型溫度設置0.3。

$$ T_{model} = T_{api} \times 0.3 \quad (0 \leq T_{api} \leq 1) $$

$$ T_{model} = T_{api} - 0.7 \quad (1 < T_{api} \leq 2) $$

因此，如果你通過API調用V3，溫度1.0相當於模型溫度0.3。

文件上傳和網頁搜索提示

對於文件上傳，請按照以下模板創建提示，其中 {file_name}、{file_content} 和 {question} 是參數。

file_template = \
"""[file name]: {file_name}
[file content begin]
{file_content}
[file content end]
{question}"""

對於網頁搜索，{search_results}、{cur_date} 和 {question} 是參數。

對於中文查詢，我們使用以下提示：

search_answer_zh_template = \
'''# 以下內容是基於用戶發送的消息的搜索結果:
{search_results}
在我給你的搜索結果中，每個結果都是[webpage X begin]...[webpage X end]格式的，X代表每篇文章的數字索引。請在適當的情況下在句子末尾引用上下文。請按照引用編號[citation:X]的格式在答案中對應部分引用上下文。如果一句話源自多個上下文，請列出所有相關的引用編號，例如[citation:3][citation:5]，切記不要將引用集中在最後返回引用編號，而是在答案對應部分列出。
在回答時，請注意以下幾點：
- 今天是{cur_date}。
- 並非搜索結果的所有內容都與用戶的問題密切相關，你需要結合問題，對搜索結果進行甄別、篩選。
- 對於列舉類的問題（如列舉所有航班信息），儘量將答案控制在10個要點以內，並告訴用戶可以查看搜索來源、獲得完整信息。優先提供信息完整、最相關的列舉項；如非必要，不要主動告訴用戶搜索結果未提供的內容。
- 對於創作類的問題（如寫論文），請務必在正文的段落中引用對應的參考編號，例如[citation:3][citation:5]，不能只在文章末尾引用。你需要解讀並概括用戶的題目要求，選擇合適的格式，充分利用搜索結果並抽取重要信息，生成符合用戶要求、極具思想深度、富有創造力與專業性的答案。你的創作篇幅需要儘可能延長，對於每一個要點的論述要推測用戶的意圖，給出儘可能多角度的回答要點，且務必信息量大、論述詳盡。
- 如果回答很長，請儘量結構化、分段落總結。如果需要分點作答，儘量控制在5個點以內，併合並相關的內容。
- 對於客觀類的問答，如果問題的答案非常簡短，可以適當補充一到兩句相關信息，以豐富內容。
- 你需要根據用戶要求和回答內容選擇合適、美觀的回答格式，確保可讀性強。
- 你的回答應該綜合多個相關網頁來回答，不能重複引用一個網頁。
- 除非用戶要求，否則你回答的語言需要和用戶提問的語言保持一致。

# 用戶消息為：
{question}'''

對於英文查詢，我們使用以下提示：

search_answer_en_template = \
'''# The following contents are the search results related to the user's message:
{search_results}
In the search results I provide to you, each result is formatted as [webpage X begin]...[webpage X end], where X represents the numerical index of each article. Please cite the context at the end of the relevant sentence when appropriate. Use the citation format [citation:X] in the corresponding part of your answer. If a sentence is derived from multiple contexts, list all relevant citation numbers, such as [citation:3][citation:5]. Be sure not to cluster all citations at the end; instead, include them in the corresponding parts of the answer.
When responding, please keep the following points in mind:
- Today is {cur_date}.
- Not all content in the search results is closely related to the user's question. You need to evaluate and filter the search results based on the question.
- For listing-type questions (e.g., listing all flight information), try to limit the answer to 10 key points and inform the user that they can refer to the search sources for complete information. Prioritize providing the most complete and relevant items in the list. Avoid mentioning content not provided in the search results unless necessary.
- For creative tasks (e.g., writing an essay), ensure that references are cited within the body of the text, such as [citation:3][citation:5], rather than only at the end of the text. You need to interpret and summarize the user's requirements, choose an appropriate format, fully utilize the search results, extract key information, and generate an answer that is insightful, creative, and professional. Extend the length of your response as much as possible, addressing each point in detail and from multiple perspectives, ensuring the content is rich and thorough.
- If the response is lengthy, structure it well and summarize it in paragraphs. If a point-by-point format is needed, try to limit it to 5 points and merge related content.
- For objective Q&A, if the answer is very brief, you may add one or two related sentences to enrich the content.
- Choose an appropriate and visually appealing format for your response based on the user's requirements and the content of the answer, ensuring strong readability.
- Your answer should synthesize information from multiple relevant webpages and avoid repeatedly citing the same webpage.
- Unless the user requests otherwise, your response should be in the same language as the user's question.

# The user's message is:
{question}'''

🚀 快速開始

本地運行

DeepSeek-V3-0324的模型結構與DeepSeek-V3完全相同。有關如何在本地運行此模型的更多信息，請訪問 DeepSeek-V3 倉庫。

此模型支持函數調用、JSON輸出和FIM完成等功能。有關如何構建提示以使用這些功能的說明，請參考 DeepSeek-V2.5 倉庫。

注意：Hugging Face的Transformers尚未直接支持該模型。

📄 許可證

本倉庫和模型權重遵循 MIT許可證。

📚 引用

@misc{deepseekai2024deepseekv3technicalreport,
      title={DeepSeek-V3 Technical Report}, 
      author={DeepSeek-AI},
      year={2024},
      eprint={2412.19437},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.19437}, 
}