Deepseek Coder 1.3B Instruct GPTQ開源模型 - 多量化參數選，助代碼生成與科研

首頁

Deepseek Coder 1.3b Instruct GPTQ

由TheBloke開發

Deepseek Coder 1.3B Instruct 的 GPTQ 量化版本，提供多種量化參數選擇，適用於代碼生成和計算機科學相關任務。

大型語言模型

Transformers

開源協議:其他 #編程助手 #代碼生成 #低資源推理

下載量 653

發布時間 : 11/5/2023

模型概述

這是一個針對編程和計算機科學問題的 1.3B 參數指令微調模型，經過 GPTQ 量化以降低硬件需求。

模型特點

多種量化選項

提供4位和8位的多種GPTQ量化參數組合，可根據硬件條件選擇最適合的版本

編程專用

專門針對編程和計算機科學問題進行優化，拒絕回答非技術問題

低資源運行

量化版本顯著降低VRAM需求，可在消費級GPU上運行

長上下文支持

支持8192 tokens的長上下文，適合處理複雜代碼

模型能力

代碼生成

編程問題解答

代碼補全

技術文檔生成

使用案例

軟件開發

代碼生成助手

根據自然語言描述生成代碼片段

可快速實現常見算法和功能

編程問題解答

解答與編程語言、框架和算法相關的問題

提供準確的技術解決方案

教育

編程教學輔助

幫助學生理解編程概念和調試代碼

提供即時反饋和解釋

🚀 Deepseek Coder 1.3B Instruct - GPTQ

本項目提供了 DeepSeek的Deepseek Coder 1.3B Instruct 的GPTQ模型文件。包含多種GPTQ參數排列，可根據自身硬件和需求選擇最合適的參數。

🚀 快速開始

在text - generation - webui中使用

請確保使用的是 text - generation - webui 的最新版本。強烈建議使用text - generation - webui的一鍵安裝程序，除非你確定知道如何手動安裝。

點擊模型選項卡。
在 下載自定義模型或LoRA 下，輸入 TheBloke/deepseek-coder-1.3b-instruct-GPTQ。
- 若要從特定分支下載，例如輸入 TheBloke/deepseek-coder-1.3b-instruct-GPTQ:gptq-4bit-32g-actorder_True。
- 具體分支列表可查看下面的 提供的文件和GPTQ參數 部分。
點擊下載。
模型將開始下載，下載完成後會顯示“已完成”。
在左上角，點擊模型旁邊的刷新圖標。
在模型下拉菜單中，選擇你剛剛下載的模型：deepseek-coder-1.3b-instruct-GPTQ。
模型將自動加載，現在可以使用了！
如果你需要自定義設置，設置完成後點擊右上角的 保存此模型的設置，然後點擊 重新加載模型。
- 注意，你不再需要也不應該手動設置GPTQ參數，這些參數會從 quantize_config.json 文件中自動設置。
準備好後，點擊 文本生成 選項卡，輸入提示詞開始使用！

從命令行下載

推薦使用 huggingface - hub Python庫：

pip3 install huggingface-hub

將 main 分支下載到名為 deepseek-coder-1.3b-instruct-GPTQ 的文件夾中：

mkdir deepseek-coder-1.3b-instruct-GPTQ
huggingface-cli download TheBloke/deepseek-coder-1.3b-instruct-GPTQ --local-dir deepseek-coder-1.3b-instruct-GPTQ --local-dir-use-symlinks False

若要從其他分支下載，添加 --revision 參數：

mkdir deepseek-coder-1.3b-instruct-GPTQ
huggingface-cli download TheBloke/deepseek-coder-1.3b-instruct-GPTQ --revision gptq-4bit-32g-actorder_True --local-dir deepseek-coder-1.3b-instruct-GPTQ --local-dir-use-symlinks False

使用Python代碼調用此GPTQ模型

安裝必要的包

需要：Transformers 4.33.0或更高版本，Optimum 1.12.0或更高版本，以及AutoGPTQ 0.4.2或更高版本。

pip3 install transformers optimum
pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # 若使用CUDA 11.7則用cu117

如果使用預構建的輪子安裝AutoGPTQ有問題，可以從源代碼安裝：

pip3 uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
git checkout v0.4.2
pip3 install .

使用示例代碼

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/deepseek-coder-1.3b-instruct-GPTQ"
# 若要使用不同分支，更改revision
# 例如: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "Tell me about AI"
prompt_template=f'''You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
{prompt}
### Response:
'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# 也可以使用transformers的pipeline進行推理

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

✨ 主要特性

提供多種GPTQ參數排列，可根據硬件和需求選擇最合適的量化參數。
支持在多個推理服務器/webui中使用，如text - generation - webui、KoboldAI United等。

📦 安裝指南

安裝依賴庫

pip3 install huggingface-hub
pip3 install transformers optimum
pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # 若使用CUDA 11.7則用cu117

📚 詳細文檔

模型信息

屬性	詳情
模型創建者	DeepSeek
原始模型	DeepSeek Coder 1.3B Instruct
模型類型	deepseek
許可證	other
許可證鏈接	LICENSE
許可證名稱	deepseek
量化者	TheBloke

可用的倉庫

提示詞模板

You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
{prompt}
### Response:

已知兼容的客戶端/服務器

這些GPTQ模型已知可在以下推理服務器/webui中使用：

提供的文件和GPTQ參數

提供了多個量化參數，以便你根據硬件和需求選擇最佳參數。每個單獨的量化版本位於不同的分支中。大多數GPTQ文件使用AutoGPTQ製作，Mistral模型目前使用Transformers製作。

GPTQ參數解釋

比特數：量化模型的位大小。
GS：GPTQ組大小。較高的數字使用較少的VRAM，但量化精度較低。“None”是可能的最低值。
Act Order：真或假。也稱為 desc_act。真會導致更好的量化精度。一些GPTQ客戶端在使用Act Order加組大小的模型時遇到過問題，但現在這個問題通常已解決。
Damp %：一個影響量化樣本處理方式的GPTQ參數。默認值為0.01，但0.1會導致稍高的精度。
GPTQ數據集：量化期間使用的校準數據集。使用更適合模型訓練的數據集可以提高量化精度。請注意，GPTQ校準數據集與用於訓練模型的數據集不同 - 請參考原始模型倉庫瞭解訓練數據集的詳細信息。
序列長度：量化時使用的數據集序列長度。理想情況下，這與模型序列長度相同。對於一些非常長序列的模型（16 + K），可能需要使用較低的序列長度。請注意，較低的序列長度不會限制量化模型的序列長度。它僅影響較長推理序列的量化精度。
ExLlama兼容性：此文件是否可以使用ExLlama加載，目前ExLlama僅支持4位的Llama和Mistral模型。

分支	比特數	GS	Act Order	Damp %	GPTQ數據集	序列長度	大小	ExLlama	描述
main	4	128	是	0.1	Evol Instruct Code	8192	0.90 GB	是	4位，帶有Act Order和組大小128g。比64g使用更少的VRAM，但精度稍低。
gptq-4bit-32g-actorder_True	4	32	是	0.1	Evol Instruct Code	8192	0.97 GB	是	4位，帶有Act Order和組大小32g。提供最高的推理質量，但使用最大的VRAM。
gptq-8bit--1g-actorder_True	8	無	是	0.1	Evol Instruct Code	8192	1.48 GB	否	8位，帶有Act Order。無組大小，以降低VRAM需求。
gptq-8bit-128g-actorder_True	8	128	是	0.1	Evol Instruct Code	8192	1.51 GB	否	8位，組大小為128g以提高推理質量，帶有Act Order以提高精度。
gptq-8bit-32g-actorder_True	8	32	是	0.1	Evol Instruct Code	8192	1.60 GB	否	8位，組大小為32g並帶有Act Order以實現最大推理質量。
gptq-4bit-64g-actorder_True	4	64	是	0.1	Evol Instruct Code	8192	0.92 GB	是	4位，帶有Act Order和組大小64g。比32g使用更少的VRAM，但精度稍低。

從分支下載的方法

在text - generation - webui中

從 main 分支下載，在“下載模型”框中輸入 TheBloke/deepseek-coder-1.3b-instruct-GPTQ。從其他分支下載，在下載名稱末尾添加 :分支名稱，例如 TheBloke/deepseek-coder-1.3b-instruct-GPTQ:gptq-4bit-32g-actorder_True。

從命令行

推薦使用 huggingface - hub Python庫：

pip3 install huggingface-hub

將 main 分支下載到名為 deepseek-coder-1.3b-instruct-GPTQ 的文件夾中：

mkdir deepseek-coder-1.3b-instruct-GPTQ
huggingface-cli download TheBloke/deepseek-coder-1.3b-instruct-GPTQ --local-dir deepseek-coder-1.3b-instruct-GPTQ --local-dir-use-symlinks False

從不同分支下載，添加 --revision 參數：

mkdir deepseek-coder-1.3b-instruct-GPTQ
huggingface-cli download TheBloke/deepseek-coder-1.3b-instruct-GPTQ --revision gptq-4bit-32g-actorder_True --local-dir deepseek-coder-1.3b-instruct-GPTQ --local-dir-use-symlinks False

更高級的huggingface - cli下載用法

如果你移除 --local-dir-use-symlinks False 參數，文件將存儲在中央Hugging Face緩存目錄中（Linux上的默認位置是：~/.cache/huggingface），並在指定的 --local-dir 中添加符號鏈接，指向它們在緩存中的實際位置。這允許恢復中斷的下載，並允許你快速將倉庫克隆到磁盤上的多個位置而無需再次觸發下載。缺點是文件隱藏在緩存文件夾中，難以知道磁盤空間的使用情況，並且在需要刪除下載的模型時難以清理。

緩存位置可以通過 HF_HOME 環境變量和/或 huggingface - cli 的 --cache-dir 參數更改。

有關使用 huggingface - cli 下載的更多文檔，請參閱：HF -> Hub Python庫 -> 下載文件 -> 從CLI下載。

要在高速連接（1Gbit/s或更高）上加速下載，請安裝 hf_transfer：

pip3 install hf_transfer

並將環境變量 HF_HUB_ENABLE_HF_TRANSFER 設置為 1：

mkdir deepseek-coder-1.3b-instruct-GPTQ
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download TheBloke/deepseek-coder-1.3b-instruct-GPTQ --local-dir deepseek-coder-1.3b-instruct-GPTQ --local-dir-use-symlinks False

Windows命令行用戶：可以在下載命令之前運行 set HF_HUB_ENABLE_HF_TRANSFER=1 來設置環境變量。

使用 `git`（不推薦）

使用 git 克隆特定分支，使用如下命令：

git clone --single-branch --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/deepseek-coder-1.3b-instruct-GPTQ

請注意，強烈不建議對HF倉庫使用Git。它比使用 huggingface - hub 慢得多，並且會使用兩倍的磁盤空間，因為它必須將模型文件存儲兩次（它將每個字節存儲在目標文件夾中，同時也作為blob存儲在 .git 文件夾中）。

使用Text Generation Inference (TGI) 服務此模型

建議使用TGI版本1.1.0或更高版本。官方Docker容器為：ghcr.io/huggingface/text-generation-inference:1.1.0

示例Docker參數：

--model-id TheBloke/deepseek-coder-1.3b-instruct-GPTQ --port 3000 --quantize gptq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096

示例Python代碼用於與TGI交互（需要huggingface - hub 0.17.0或更高版本）：

pip3 install huggingface-hub

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
{prompt}
### Response:
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(prompt,
                                  max_new_tokens=128,
                                  do_sample=True,
                                  temperature=0.7,
                                  top_p=0.95,
                                  top_k=40,
                                  repetition_penalty=1.1)

print(f"Model output: {response}")

🔧 技術細節

提供的文件經過測試可與Transformers一起使用。對於非Mistral模型，也可以直接使用AutoGPTQ。ExLlama 與4位的Llama和Mistral模型兼容。

📄 許可證

本項目使用 deepseek 許可證，詳情請見 LICENSE。

其他信息

Discord

如需進一步支持，以及討論這些模型和人工智能相關內容，請加入：TheBloke AI的Discord服務器

感謝與貢獻方式

感謝 chirper.ai 團隊！感謝來自 [gpus.llm - utils.org](llm - utils) 的Clay！

很多人詢問是否可以貢獻。我喜歡提供模型並幫助他人，希望能有更多時間做這些事，也希望能開展新的項目，如微調/訓練。

如果你有能力並願意貢獻，將不勝感激，這將幫助我提供更多模型，並開始新的AI項目。捐贈者將在所有AI/LLM/模型問題和請求上獲得優先支持，訪問私人Discord房間，以及其他福利。

Patreon: https://patreon.com/TheBlokeAI
Ko - Fi: https://ko - fi.com/TheBlokeAI

特別感謝：Aemon Algiz。

Patreon特別提及：Brandon Frisco, LangChain4j, Spiking Neurons AB, transmissions 11, Joseph William Delisle, Nitin Borwankar, Willem Michiel, Michael Dempsey, vamX, Jeffrey Morgan, zynix, jjj, Omer Bin Jawed, Sean Connelly, jinyuan sun, Jeromy Smith, Shadi, Pawan Osman, Chadd, Elijah Stavena, Illia Dulskyi, Sebastain Graf, Stephen Murray, terasurfer, Edmond Seymore, Celu Ramasamy, Mandus, Alex, biorpg, Ajan Kanaga, Clay Pascal, Raven Klaugh, ÈòøÊòé, K, ya boyyy, usrbinkat, Alicia Loh, John Villwock, ReadyPlayerEmma, Chris Smitley, Cap'n Zoog, fincy, GodLy, S_X, sidney chen, Cory Kujawski, OG, Mano Prime, AzureBlack, Pieter, Kalila, Spencer Kim, Tom X Nguyen, Stanislav Ovsiannikov, Michael Levine, Andrey, Trailburnt, Vadim, Enrico Ros, Talal Aujan, Brandon Phillips, Jack West, Eugene Pentland, Michael Davis, Will Dee, webtim, Jonathan Leane, Alps Aficionado, Rooh Singh, Tiffany J. Kim, theTransient, Luke @flexchar, Elle, Caitlyn Gatomon, Ari Malik, subjectnull, Johann - Peter Hartmann, Trenton Dambrowitz, Imad Khwaja, Asp the Wyvern, Emad Mostaque, Rainer Wilmers, Alexandros Triantafyllidis, Nicholas, Pedro Madruga, SuperWojo, Harry Royden McLaughlin, James Bentley, Olakabola, David Ziegler, Ai Maven, Jeff Scr