DeepSeek-V3-0324-GGUF開源模型 - 免費本地推理，多基準測試表現顯著提升

首頁

Deepseek V3 0324 GGUF

由unsloth開發

DeepSeek-V3-0324 是 DeepSeek 團隊發佈的 3 月更新版本，相比前代在多個基準測試上有顯著提升，支持動態量化版本，適用於本地推理。

大型語言模型英語開源協議:MIT #高精度量化推理 #中文寫作增強 #前端代碼生成

下載量 108.44k

發布時間 : 3/25/2025

模型概述

DeepSeek-V3-0324 是一個高性能的大語言模型，支持多種量化版本，適用於本地部署和推理任務。

模型特點

動態量化

支持 1-4 位動態量化版本，相比標準量化顯著提升了精度和效果。

高性能推理

在 MMLU-Pro、GPQA、AIME 和 LiveCodeBench 等基準測試上相比前代有顯著提升。

本地運行支持

支持在 llama.cpp、LMStudio、Open WebUI 等推理框架中運行。

模型能力

文本生成

代碼生成

推理任務

使用案例

代碼生成

代碼補全

生成高質量的代碼片段，適用於開發環境。

生成的代碼質量高且運行良好。

文本推理

複雜問題解答

回答複雜的科學和技術問題。

在 GPQA 和 AIME 等基準測試上表現優異。

🚀 DeepSeek-V3-0324 Dynamic GGUF

本項目提供了DeepSeek-V3-0324的GGUF格式模型，可在llama.cpp、LMStudio、Open WebUI等推理框架中運行。同時，還提供了免費的微調筆記本，幫助用戶將模型微調為推理模型。

🚀 快速開始

本地運行指南：閱讀我們的指南，獲取在本地運行DeepSeek-V3-0324的詳細說明。
免費微調：使用我們的免費Google Colab筆記本，將Llama 3.1 (8B) 轉換為推理模型：點擊開始。

✨ 主要特性

DeepSeek-V3-0324在多個關鍵方面相較於其前身DeepSeek-V3有顯著改進。

模型性能

推理能力

基準測試性能顯著提升：
- MMLU-Pro：從75.9提升至81.2（+5.3）
- GPQA：從59.1提升至68.4（+9.3）
- AIME：從39.6提升至59.4（+19.8）
- LiveCodeBench：從39.2提升至49.2（+10.0）

前端網頁開發

代碼可執行性提高
網頁和遊戲前端更美觀

中文寫作能力

風格和內容質量提升：
- 符合R1寫作風格
- 中長篇寫作質量更佳
功能增強：
- 多輪交互式重寫能力提升
- 翻譯質量和書信寫作優化

中文搜索能力

增強報告分析請求，輸出更詳細

函數調用改進

函數調用準確性提高，修復了之前V3版本的問題

📦 安裝指南

DeepSeek-V3-0324的模型結構與DeepSeek-V3完全相同。如需瞭解如何在本地運行此模型的更多信息，請訪問DeepSeek-V3倉庫。

⚠️ 重要提示

Hugging Face的Transformers尚未直接支持該模型。

💻 使用示例

系統提示

在官方DeepSeek網頁/應用中，我們使用帶有特定日期的相同系統提示。

該助手為DeepSeek Chat，由深度求索公司創造。
今天是{current date}。

例如：

該助手為DeepSeek Chat，由深度求索公司創造。
今天是3月24日，星期一。

溫度參數

在我們的網頁和應用環境中，溫度參數 $T_{model}$ 設置為0.3。由於許多用戶在API調用中使用默認溫度1.0，我們實現了一個API溫度 $T_{api}$ 映射機制，將輸入的API溫度值1.0調整為最合適的模型溫度設置0.3。

$$ T_{model} = T_{api} \times 0.3 \quad (0 \leq T_{api} \leq 1) $$

$$ T_{model} = T_{api} - 0.7 \quad (1 < T_{api} \leq 2) $$

因此，如果你通過API調用V3，溫度1.0相當於模型溫度0.3。

文件上傳和網頁搜索提示

文件上傳

請按照以下模板創建提示，其中 {file_name}、{file_content} 和 {question} 是參數。

file_template = \
"""[file name]: {file_name}
[file content begin]
{file_content}
[file content end]
{question}"""

網頁搜索

對於中文查詢，我們使用以下提示：

search_answer_zh_template = \
'''# 以下內容是基於用戶發送的消息的搜索結果:
{search_results}
在我給你的搜索結果中，每個結果都是[webpage X begin]...[webpage X end]格式的，X代表每篇文章的數字索引。請在適當的情況下在句子末尾引用上下文。請按照引用編號[citation:X]的格式在答案中對應部分引用上下文。如果一句話源自多個上下文，請列出所有相關的引用編號，例如[citation:3][citation:5]，切記不要將引用集中在最後返回引用編號，而是在答案對應部分列出。
在回答時，請注意以下幾點：
- 今天是{cur_date}。
- 並非搜索結果的所有內容都與用戶的問題密切相關，你需要結合問題，對搜索結果進行甄別、篩選。
- 對於列舉類的問題（如列舉所有航班信息），儘量將答案控制在10個要點以內，並告訴用戶可以查看搜索來源、獲得完整信息。優先提供信息完整、最相關的列舉項；如非必要，不要主動告訴用戶搜索結果未提供的內容。
- 對於創作類的問題（如寫論文），請務必在正文的段落中引用對應的參考編號，例如[citation:3][citation:5]，不能只在文章末尾引用。你需要解讀並概括用戶的題目要求，選擇合適的格式，充分利用搜索結果並抽取重要信息，生成符合用戶要求、極具思想深度、富有創造力與專業性的答案。你的創作篇幅需要儘可能延長，對於每一個要點的論述要推測用戶的意圖，給出儘可能多角度的回答要點，且務必信息量大、論述詳盡。
- 如果回答很長，請儘量結構化、分段落總結。如果需要分點作答，儘量控制在5個點以內，併合並相關的內容。
- 對於客觀類的問答，如果問題的答案非常簡短，可以適當補充一到兩句相關信息，以豐富內容。
- 你需要根據用戶要求和回答內容選擇合適、美觀的回答格式，確保可讀性強。
- 你的回答應該綜合多個相關網頁來回答，不能重複引用一個網頁。
- 除非用戶要求，否則你回答的語言需要和用戶提問的語言保持一致。

# 用戶消息為：
{question}'''

對於英文查詢，我們使用以下提示：

search_answer_en_template = \
'''# The following contents are the search results related to the user's message:
{search_results}
In the search results I provide to you, each result is formatted as [webpage X begin]...[webpage X end], where X represents the numerical index of each article. Please cite the context at the end of the relevant sentence when appropriate. Use the citation format [citation:X] in the corresponding part of your answer. If a sentence is derived from multiple contexts, list all relevant citation numbers, such as [citation:3][citation:5]. Be sure not to cluster all citations at the end; instead, include them in the corresponding parts of the answer.
When responding, please keep the following points in mind:
- Today is {cur_date}.
- Not all content in the search results is closely related to the user's question. You need to evaluate and filter the search results based on the question.
- For listing-type questions (e.g., listing all flight information), try to limit the answer to 10 key points and inform the user that they can refer to the search sources for complete information. Prioritize providing the most complete and relevant items in the list. Avoid mentioning content not provided in the search results unless necessary.
- For creative tasks (e.g., writing an essay), ensure that references are cited within the body of the text, such as [citation:3][citation:5], rather than only at the end of the text. You need to interpret and summarize the user's requirements, choose an appropriate format, fully utilize the search results, extract key information, and generate an answer that is insightful, creative, and professional. Extend the length of your response as much as possible, addressing each point in detail and from multiple perspectives, ensuring the content is rich and thorough.
- If the response is lengthy, structure it well and summarize it in paragraphs. If a point-by-point format is needed, try to limit it to 5 points and merge related content.
- For objective Q&A, if the answer is very brief, you may add one or two related sentences to enrich the content.
- Choose an appropriate and visually appealing format for your response based on the user's requirements and the content of the answer, ensuring strong readability.
- Your answer should synthesize information from multiple relevant webpages and avoid repeatedly citing the same webpage.
- Unless the user requests otherwise, your response should be in the same language as the user's question.

# The user's message is:
{question}'''

📚 詳細文檔

模型量化版本

我們的DeepSeek-V3-0324 GGUFs包含1 - 4位動態版本，相較於標準量化，具有更高的準確性和更好的結果。

MoE位數	類型	磁盤大小	準確性	鏈接	詳情
1.78位 (初步)	IQ1_S	186GB	一般	鏈接	`down_proj` 在MoE中混合2.06/1.78位
1.93位 (初步)	IQ1_M	196GB	尚可	鏈接	`down_proj` 在MoE中混合2.06/1.93位
2.42位	IQ2_XXS	219GB	推薦	鏈接	`down_proj` 在MoE中全部為2.42位
2.71位	Q2_K_XL	248GB	推薦	鏈接	`down_proj` 在MoE中混合3.5/2.71位
3.5位	Q3_K_XL	321GB	優秀	鏈接	`down_proj` 在MoE中混合4.5/3.5位
4.5位	Q4_K_XL	405GB	最佳	鏈接	`down_proj` 在MoE中混合5.5/4.5位

注：初步版本表示通過我們的測試，它們通常表現良好，但有時可能無法生成最佳代碼，因此需要更多的工作和測試。2.71位在性能/大小方面表現最佳，生成的代碼質量高且運行良好。2.42位也通過了我們的所有測試。因此，為了獲得最佳結果，建議使用2.42位（IQ2_XXS）或2.71位（Q2_K_XL）版本。雖然不是必需的，但建議至少擁有180GB以上的VRAM + RAM。

免費微調

我們提供免費的Google Colab筆記本，可將多種模型微調為推理模型。所有筆記本都對初學者友好！添加你的數據集，點擊“全部運行”，你將獲得一個速度提升2倍的微調模型，該模型可以導出為GGUF、vLLM格式或上傳到Hugging Face。

Unsloth支持的模型	免費筆記本	性能	內存使用
GRPO with Phi-4 (14B)	▶️ 在Colab上開始	快2倍	減少80%
Llama-3.2 (3B)	▶️ 在Colab上開始	快2.4倍	減少58%
Llama-3.2 (11B vision)	▶️ 在Colab上開始	快2倍	減少60%
Qwen2 VL (7B)	▶️ 在Colab上開始	快1.8倍	減少60%
Qwen2.5 (7B)	▶️ 在Colab上開始	快2倍	減少60%
Llama-3.1 (8B)	▶️ 在Colab上開始	快2.4倍	減少58%
Phi-3.5 (mini)	▶️ 在Colab上開始	快2倍	減少50%
Gemma 2 (9B)	▶️ 在Colab上開始	快2.4倍	減少58%
Mistral (7B)	▶️ 在Colab上開始	快2.2倍	減少62%

模型功能支持

此模型支持函數調用、JSON輸出和FIM完成等功能。有關如何構建提示以使用這些功能的說明，請參考DeepSeek-V2.5倉庫。

📄 許可證

本倉庫和模型權重遵循MIT許可證。

🔧 技術細節

引用信息

@misc{deepseekai2024deepseekv3technicalreport,
      title={DeepSeek-V3 Technical Report}, 
      author={DeepSeek-AI},
      year={2024},
      eprint={2412.19437},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.19437}, 
}