DeepSeek-V3-0324-GGUF-UD開源模型 - 可在多框架運行的動態量化版本

首頁

Deepseek V3 0324 GGUF UD

由unsloth開發

DeepSeek-V3-0324 是由 Unsloth 提供的動態量化版本，支持在 llama.cpp、LMStudio 等推理框架中運行。

大型語言模型英語開源協議:MIT #動態量化推理 #MoE專家混合 #高性能代碼生成

下載量 6,270

發布時間 : 4/21/2025

模型概述

DeepSeek-V3-0324 是一個高性能語言模型，通過 Unsloth 的動態量化技術優化，提供多種位數的量化版本，適用於本地推理。

模型特點

動態量化技術

Unsloth Dynamic v2.0 實現了選擇性量化，相比標準位數大幅提高了準確率。

多位數選擇

提供1-4位不同量化版本，適應不同硬件需求。

高性能推理

優化後的模型在 llama.cpp、LMStudio 等框架中運行效率高。

模型能力

文本生成

代碼生成

自然語言理解

使用案例

代碼生成

編程輔助

幫助開發者生成和優化代碼片段。

生成的代碼質量優秀，運行良好。

文本創作

內容生成

生成高質量的文章、報告等文本內容。

生成的文本流暢且符合邏輯。

🚀 DeepSeek-V3-0324 Dynamic GGUF

DeepSeek-V3-0324 Dynamic GGUF 提供了在 llama.cpp、LMStudio、Open WebUI 等推理框架中運行模型的能力，同時支持用戶基於此模型進行微調，以滿足不同的應用場景需求。

🚀 快速開始

本地運行指南：閱讀我們的指南，獲取在本地運行 DeepSeek-V3-0324 的詳細說明。
量化方法優勢：Unsloth Dynamic v2.0 實現了卓越的準確性，性能優於其他領先的量化方法。Unsloth 的動態量化是選擇性量化的，與標準比特量化相比，大大提高了準確性。

✨ 主要特性

DeepSeek-V3-0324 在多個關鍵方面較其前身 DeepSeek-V3 有顯著改進：模型性能

推理能力

基準測試性能顯著提升：
- MMLU-Pro：從 75.9 提升至 81.2（+5.3）
- GPQA：從 59.1 提升至 68.4（+9.3）
- AIME：從 39.6 提升至 59.4（+19.8）
- LiveCodeBench：從 39.2 提升至 49.2（+10.0）

前端網頁開發

提高了代碼的可執行性
生成的網頁和遊戲前端更美觀

中文寫作能力

提升了風格和內容質量：
- 符合 R1 寫作風格
- 中長篇寫作質量更佳
功能增強：
- 改進了多輪交互式重寫功能
- 優化了翻譯質量和書信寫作能力

中文搜索能力

增強了報告分析請求的處理能力，輸出更詳細。

函數調用改進

提高了函數調用的準確性，修復了之前 V3 版本的問題。

📦 安裝指南

DeepSeek-V3-0324 的模型結構與 DeepSeek-V3 完全相同。如需瞭解更多關於在本地運行此模型的信息，請訪問 DeepSeek-V3 倉庫。

⚠️ 重要提示

目前 Hugging Face 的 Transformers 尚未直接支持該模型。

💻 使用示例

系統提示

在官方 DeepSeek 網頁/應用中，我們使用帶有特定日期的相同系統提示。

該助手為DeepSeek Chat，由深度求索公司創造。
今天是{current date}。

例如：

該助手為DeepSeek Chat，由深度求索公司創造。
今天是3月24日，星期一。

溫度參數

在我們的網頁和應用環境中，溫度參數 $T_{model}$ 設置為 0.3。由於許多用戶在 API 調用中使用默認溫度 1.0，我們實現了一個 API 溫度 $T_{api}$ 映射機制，將輸入的 API 溫度值 1.0 調整為最合適的模型溫度設置 0.3。 $$ T_{model} = T_{api} \times 0.3 \quad (0 \leq T_{api} \leq 1) $$ $$ T_{model} = T_{api} - 0.7 \quad (1 < T_{api} \leq 2) $$ 因此，如果你通過 API 調用 V3，溫度 1.0 相當於模型溫度 0.3。

文件上傳和網頁搜索提示

文件上傳

請按照以下模板創建提示，其中 {file_name}、{file_content} 和 {question} 是參數。

file_template = \
"""[file name]: {file_name}
[file content begin]
{file_content}
[file content end]
{question}"""

網頁搜索

對於中文查詢，我們使用以下提示：

search_answer_zh_template = \
'''# 以下內容是基於用戶發送的消息的搜索結果:
{search_results}
在我給你的搜索結果中，每個結果都是[webpage X begin]...[webpage X end]格式的，X代表每篇文章的數字索引。請在適當的情況下在句子末尾引用上下文。請按照引用編號[citation:X]的格式在答案中對應部分引用上下文。如果一句話源自多個上下文，請列出所有相關的引用編號，例如[citation:3][citation:5]，切記不要將引用集中在最後返回引用編號，而是在答案對應部分列出。
在回答時，請注意以下幾點：
- 今天是{cur_date}。
- 並非搜索結果的所有內容都與用戶的問題密切相關，你需要結合問題，對搜索結果進行甄別、篩選。
- 對於列舉類的問題（如列舉所有航班信息），儘量將答案控制在10個要點以內，並告訴用戶可以查看搜索來源、獲得完整信息。優先提供信息完整、最相關的列舉項；如非必要，不要主動告訴用戶搜索結果未提供的內容。
- 對於創作類的問題（如寫論文），請務必在正文的段落中引用對應的參考編號，例如[citation:3][citation:5]，不能只在文章末尾引用。你需要解讀並概括用戶的題目要求，選擇合適的格式，充分利用搜索結果並抽取重要信息，生成符合用戶要求、極具思想深度、富有創造力與專業性的答案。你的創作篇幅需要儘可能延長，對於每一個要點的論述要推測用戶的意圖，給出儘可能多角度的回答要點，且務必信息量大、論述詳盡。
- 如果回答很長，請儘量結構化、分段落總結。如果需要分點作答，儘量控制在5個點以內，併合並相關的內容。
- 對於客觀類的問答，如果問題的答案非常簡短，可以適當補充一到兩句相關信息，以豐富內容。
- 你需要根據用戶要求和回答內容選擇合適、美觀的回答格式，確保可讀性強。
- 你的回答應該綜合多個相關網頁來回答，不能重複引用一個網頁。
- 除非用戶要求，否則你回答的語言需要和用戶提問的語言保持一致。

# 用戶消息為：
{question}'''

對於英文查詢，我們使用以下提示：

search_answer_en_template = \
'''# The following contents are the search results related to the user's message:
{search_results}
In the search results I provide to you, each result is formatted as [webpage X begin]...[webpage X end], where X represents the numerical index of each article. Please cite the context at the end of the relevant sentence when appropriate. Use the citation format [citation:X] in the corresponding part of your answer. If a sentence is derived from multiple contexts, list all relevant citation numbers, such as [citation:3][citation:5]. Be sure not to cluster all citations at the end; instead, include them in the corresponding parts of the answer.
When responding, please keep the following points in mind:
- Today is {cur_date}.
- Not all content in the search results is closely related to the user's question. You need to evaluate and filter the search results based on the question.
- For listing-type questions (e.g., listing all flight information), try to limit the answer to 10 key points and inform the user that they can refer to the search sources for complete information. Prioritize providing the most complete and relevant items in the list. Avoid mentioning content not provided in the search results unless necessary.
- For creative tasks (e.g., writing an essay), ensure that references are cited within the body of the text, such as [citation:3][citation:5], rather than only at the end of the text. You need to interpret and summarize the user's requirements, choose an appropriate format, fully utilize the search results, extract key information, and generate an answer that is insightful, creative, and professional. Extend the length of your response as much as possible, addressing each point in detail and from multiple perspectives, ensuring the content is rich and thorough.
- If the response is lengthy, structure it well and summarize it in paragraphs. If a point-by-point format is needed, try to limit it to 5 points and merge related content.
- For objective Q&A, if the answer is very brief, you may add one or two related sentences to enrich the content.
- Choose an appropriate and visually appealing format for your response based on the user's requirements and the content of the answer, ensuring strong readability.
- Your answer should synthesize information from multiple relevant webpages and avoid repeatedly citing the same webpage.
- Unless the user requests otherwise, your response should be in the same language as the user's question.

# The user's message is:
{question}'''

📚 詳細文檔

本地運行

該模型支持函數調用、JSON 輸出和 FIM 完成等功能。有關如何構建提示以使用這些功能的說明，請參考 DeepSeek-V2.5 倉庫。

免費微調

我們提供一個免費的 Google Colab 筆記本，用於將 Llama 3.1 (8B) 轉換為推理模型：鏈接。所有筆記本都對初學者友好！添加你的數據集，點擊“全部運行”，你將獲得一個速度快 2 倍的微調模型，該模型可以導出為 GGUF、vLLM 或上傳到 Hugging Face。

Unsloth支持的模型	免費筆記本鏈接	性能提升	內存使用減少
GRPO with Phi-4 (14B)	▶️ 在Colab上開始	2 倍更快	80% 更少
Llama-3.2 (3B)	▶️ 在Colab上開始	2.4 倍更快	58% 更少
Llama-3.2 (11B vision)	▶️ 在Colab上開始	2 倍更快	60% 更少
Qwen2 VL (7B)	▶️ 在Colab上開始	1.8 倍更快	60% 更少
Qwen2.5 (7B)	▶️ 在Colab上開始	2 倍更快	60% 更少
Llama-3.1 (8B)	▶️ 在Colab上開始	2.4 倍更快	58% 更少
Phi-3.5 (mini)	▶️ 在Colab上開始	2 倍更快	50% 更少
Gemma 2 (9B)	▶️ 在Colab上開始	2.4 倍更快	58% 更少
Mistral (7B)	▶️ 在Colab上開始	2.2 倍更快	62% 更少

🔧 技術細節

模型版本

本項目基於 deepseek-ai/DeepSeek-V3-0324 基礎模型，支持多種量化版本，具體如下：

MoE比特	類型	磁盤大小	鏈接	詳情
1.78bit (prelim)	IQ1_S	192GB	鏈接	MoE 中 down_proj 混合 2.06/1.78bit
1.93bit (prelim)	IQ1_M	200GB	鏈接	MoE 中 down_proj 混合 2.06/1.93bit
2.42bit	IQ2_XXS	215GB	鏈接	MoE 中 down_proj 全為 2.42bit
2.71bit	Q2_K_XL	250GB	鏈接	MoE 中 down_proj 混合 3.5/2.71bit
3.5bit	Q3_K_XL	296GB	鏈接	MoE 中 down_proj 混合 4.5/3.5bit
4.5bit	Q4_K_XL	384GB	鏈接	MoE 中 down_proj 混合 5.5/4.5bit

💡 使用建議

“Prelim” 表示初步版本，通過我們的測試，它們通常表現良好，但有時生成的代碼不是最優的，因此需要更多的工作和測試。

2.71bit 在性能/大小方面表現最佳，生成的代碼質量高且運行良好。2.42bit 也通過了我們的所有測試。因此，為了獲得最佳效果，建議使用 2.42 位（IQ2_XXS）或 2.71 位（Q2_K_XL）版本。

儘量確保系統的 VRAM + RAM 總和至少為 180GB 以上，但這不是必需條件。

致謝

感謝 DeepSeek 團隊發佈 DeepSeek V3 模型的 3 月更新。也感謝 bartowski 提供的 imatric V3 量化版本。

📄 許可證

本倉庫和模型權重遵循 MIT 許可證。

引用

@misc{deepseekai2024deepseekv3technicalreport,
      title={DeepSeek-V3 Technical Report}, 
      author={DeepSeek-AI},
      year={2024},
      eprint={2412.19437},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.19437}, 
}