Deepseek Coder 1.3B Instruct GPTQ开源模型 - 多量化参数选，助代码生成与科研

首页

Deepseek Coder 1.3b Instruct GPTQ

由 TheBloke 开发

Deepseek Coder 1.3B Instruct 的 GPTQ 量化版本，提供多种量化参数选择，适用于代码生成和计算机科学相关任务。

大型语言模型

Transformers

开源协议:其他 #编程助手 #代码生成 #低资源推理

下载量 653

发布时间 : 11/5/2023

模型简介

这是一个针对编程和计算机科学问题的 1.3B 参数指令微调模型，经过 GPTQ 量化以降低硬件需求。

模型特点

多种量化选项

提供4位和8位的多种GPTQ量化参数组合，可根据硬件条件选择最适合的版本

编程专用

专门针对编程和计算机科学问题进行优化，拒绝回答非技术问题

低资源运行

量化版本显著降低VRAM需求，可在消费级GPU上运行

长上下文支持

支持8192 tokens的长上下文，适合处理复杂代码

模型能力

代码生成

编程问题解答

代码补全

技术文档生成

使用案例

软件开发

代码生成助手

根据自然语言描述生成代码片段

可快速实现常见算法和功能

编程问题解答

解答与编程语言、框架和算法相关的问题

提供准确的技术解决方案

教育

编程教学辅助

帮助学生理解编程概念和调试代码

提供即时反馈和解释

🚀 Deepseek Coder 1.3B Instruct - GPTQ

本项目提供了 DeepSeek的Deepseek Coder 1.3B Instruct 的GPTQ模型文件。包含多种GPTQ参数排列，可根据自身硬件和需求选择最合适的参数。

🚀 快速开始

在text - generation - webui中使用

请确保使用的是 text - generation - webui 的最新版本。强烈建议使用text - generation - webui的一键安装程序，除非你确定知道如何手动安装。

点击模型选项卡。
在 下载自定义模型或LoRA 下，输入 TheBloke/deepseek-coder-1.3b-instruct-GPTQ。
- 若要从特定分支下载，例如输入 TheBloke/deepseek-coder-1.3b-instruct-GPTQ:gptq-4bit-32g-actorder_True。
- 具体分支列表可查看下面的 提供的文件和GPTQ参数 部分。
点击下载。
模型将开始下载，下载完成后会显示“已完成”。
在左上角，点击模型旁边的刷新图标。
在模型下拉菜单中，选择你刚刚下载的模型：deepseek-coder-1.3b-instruct-GPTQ。
模型将自动加载，现在可以使用了！
如果你需要自定义设置，设置完成后点击右上角的 保存此模型的设置，然后点击 重新加载模型。
- 注意，你不再需要也不应该手动设置GPTQ参数，这些参数会从 quantize_config.json 文件中自动设置。
准备好后，点击 文本生成 选项卡，输入提示词开始使用！

从命令行下载

推荐使用 huggingface - hub Python库：

pip3 install huggingface-hub

将 main 分支下载到名为 deepseek-coder-1.3b-instruct-GPTQ 的文件夹中：

mkdir deepseek-coder-1.3b-instruct-GPTQ
huggingface-cli download TheBloke/deepseek-coder-1.3b-instruct-GPTQ --local-dir deepseek-coder-1.3b-instruct-GPTQ --local-dir-use-symlinks False

若要从其他分支下载，添加 --revision 参数：

mkdir deepseek-coder-1.3b-instruct-GPTQ
huggingface-cli download TheBloke/deepseek-coder-1.3b-instruct-GPTQ --revision gptq-4bit-32g-actorder_True --local-dir deepseek-coder-1.3b-instruct-GPTQ --local-dir-use-symlinks False

使用Python代码调用此GPTQ模型

安装必要的包

需要：Transformers 4.33.0或更高版本，Optimum 1.12.0或更高版本，以及AutoGPTQ 0.4.2或更高版本。

pip3 install transformers optimum
pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # 若使用CUDA 11.7则用cu117

如果使用预构建的轮子安装AutoGPTQ有问题，可以从源代码安装：

pip3 uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
git checkout v0.4.2
pip3 install .

使用示例代码

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/deepseek-coder-1.3b-instruct-GPTQ"
# 若要使用不同分支，更改revision
# 例如: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "Tell me about AI"
prompt_template=f'''You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
{prompt}
### Response:
'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# 也可以使用transformers的pipeline进行推理

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

✨ 主要特性

提供多种GPTQ参数排列，可根据硬件和需求选择最合适的量化参数。
支持在多个推理服务器/webui中使用，如text - generation - webui、KoboldAI United等。

📦 安装指南

安装依赖库

pip3 install huggingface-hub
pip3 install transformers optimum
pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # 若使用CUDA 11.7则用cu117

📚 详细文档

模型信息

属性	详情
模型创建者	DeepSeek
原始模型	DeepSeek Coder 1.3B Instruct
模型类型	deepseek
许可证	other
许可证链接	LICENSE
许可证名称	deepseek
量化者	TheBloke

可用的仓库

提示词模板

You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
{prompt}
### Response:

已知兼容的客户端/服务器

这些GPTQ模型已知可在以下推理服务器/webui中使用：

提供的文件和GPTQ参数

提供了多个量化参数，以便你根据硬件和需求选择最佳参数。每个单独的量化版本位于不同的分支中。大多数GPTQ文件使用AutoGPTQ制作，Mistral模型目前使用Transformers制作。

GPTQ参数解释

比特数：量化模型的位大小。
GS：GPTQ组大小。较高的数字使用较少的VRAM，但量化精度较低。“None”是可能的最低值。
Act Order：真或假。也称为 desc_act。真会导致更好的量化精度。一些GPTQ客户端在使用Act Order加组大小的模型时遇到过问题，但现在这个问题通常已解决。
Damp %：一个影响量化样本处理方式的GPTQ参数。默认值为0.01，但0.1会导致稍高的精度。
GPTQ数据集：量化期间使用的校准数据集。使用更适合模型训练的数据集可以提高量化精度。请注意，GPTQ校准数据集与用于训练模型的数据集不同 - 请参考原始模型仓库了解训练数据集的详细信息。
序列长度：量化时使用的数据集序列长度。理想情况下，这与模型序列长度相同。对于一些非常长序列的模型（16 + K），可能需要使用较低的序列长度。请注意，较低的序列长度不会限制量化模型的序列长度。它仅影响较长推理序列的量化精度。
ExLlama兼容性：此文件是否可以使用ExLlama加载，目前ExLlama仅支持4位的Llama和Mistral模型。

分支	比特数	GS	Act Order	Damp %	GPTQ数据集	序列长度	大小	ExLlama	描述
main	4	128	是	0.1	Evol Instruct Code	8192	0.90 GB	是	4位，带有Act Order和组大小128g。比64g使用更少的VRAM，但精度稍低。
gptq-4bit-32g-actorder_True	4	32	是	0.1	Evol Instruct Code	8192	0.97 GB	是	4位，带有Act Order和组大小32g。提供最高的推理质量，但使用最大的VRAM。
gptq-8bit--1g-actorder_True	8	无	是	0.1	Evol Instruct Code	8192	1.48 GB	否	8位，带有Act Order。无组大小，以降低VRAM需求。
gptq-8bit-128g-actorder_True	8	128	是	0.1	Evol Instruct Code	8192	1.51 GB	否	8位，组大小为128g以提高推理质量，带有Act Order以提高精度。
gptq-8bit-32g-actorder_True	8	32	是	0.1	Evol Instruct Code	8192	1.60 GB	否	8位，组大小为32g并带有Act Order以实现最大推理质量。
gptq-4bit-64g-actorder_True	4	64	是	0.1	Evol Instruct Code	8192	0.92 GB	是	4位，带有Act Order和组大小64g。比32g使用更少的VRAM，但精度稍低。

从分支下载的方法

在text - generation - webui中

从 main 分支下载，在“下载模型”框中输入 TheBloke/deepseek-coder-1.3b-instruct-GPTQ。从其他分支下载，在下载名称末尾添加 :分支名称，例如 TheBloke/deepseek-coder-1.3b-instruct-GPTQ:gptq-4bit-32g-actorder_True。

从命令行

推荐使用 huggingface - hub Python库：

pip3 install huggingface-hub

将 main 分支下载到名为 deepseek-coder-1.3b-instruct-GPTQ 的文件夹中：

mkdir deepseek-coder-1.3b-instruct-GPTQ
huggingface-cli download TheBloke/deepseek-coder-1.3b-instruct-GPTQ --local-dir deepseek-coder-1.3b-instruct-GPTQ --local-dir-use-symlinks False

从不同分支下载，添加 --revision 参数：

mkdir deepseek-coder-1.3b-instruct-GPTQ
huggingface-cli download TheBloke/deepseek-coder-1.3b-instruct-GPTQ --revision gptq-4bit-32g-actorder_True --local-dir deepseek-coder-1.3b-instruct-GPTQ --local-dir-use-symlinks False

更高级的huggingface - cli下载用法

如果你移除 --local-dir-use-symlinks False 参数，文件将存储在中央Hugging Face缓存目录中（Linux上的默认位置是：~/.cache/huggingface），并在指定的 --local-dir 中添加符号链接，指向它们在缓存中的实际位置。这允许恢复中断的下载，并允许你快速将仓库克隆到磁盘上的多个位置而无需再次触发下载。缺点是文件隐藏在缓存文件夹中，难以知道磁盘空间的使用情况，并且在需要删除下载的模型时难以清理。

缓存位置可以通过 HF_HOME 环境变量和/或 huggingface - cli 的 --cache-dir 参数更改。

有关使用 huggingface - cli 下载的更多文档，请参阅：HF -> Hub Python库 -> 下载文件 -> 从CLI下载。

要在高速连接（1Gbit/s或更高）上加速下载，请安装 hf_transfer：

pip3 install hf_transfer

并将环境变量 HF_HUB_ENABLE_HF_TRANSFER 设置为 1：

mkdir deepseek-coder-1.3b-instruct-GPTQ
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download TheBloke/deepseek-coder-1.3b-instruct-GPTQ --local-dir deepseek-coder-1.3b-instruct-GPTQ --local-dir-use-symlinks False

Windows命令行用户：可以在下载命令之前运行 set HF_HUB_ENABLE_HF_TRANSFER=1 来设置环境变量。

使用 `git`（不推荐）

使用 git 克隆特定分支，使用如下命令：

git clone --single-branch --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/deepseek-coder-1.3b-instruct-GPTQ

请注意，强烈不建议对HF仓库使用Git。它比使用 huggingface - hub 慢得多，并且会使用两倍的磁盘空间，因为它必须将模型文件存储两次（它将每个字节存储在目标文件夹中，同时也作为blob存储在 .git 文件夹中）。

使用Text Generation Inference (TGI) 服务此模型

建议使用TGI版本1.1.0或更高版本。官方Docker容器为：ghcr.io/huggingface/text-generation-inference:1.1.0

示例Docker参数：

--model-id TheBloke/deepseek-coder-1.3b-instruct-GPTQ --port 3000 --quantize gptq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096

示例Python代码用于与TGI交互（需要huggingface - hub 0.17.0或更高版本）：

pip3 install huggingface-hub

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
{prompt}
### Response:
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(prompt,
                                  max_new_tokens=128,
                                  do_sample=True,
                                  temperature=0.7,
                                  top_p=0.95,
                                  top_k=40,
                                  repetition_penalty=1.1)

print(f"Model output: {response}")

🔧 技术细节

提供的文件经过测试可与Transformers一起使用。对于非Mistral模型，也可以直接使用AutoGPTQ。ExLlama 与4位的Llama和Mistral模型兼容。

📄 许可证

本项目使用 deepseek 许可证，详情请见 LICENSE。

其他信息

Discord

如需进一步支持，以及讨论这些模型和人工智能相关内容，请加入：TheBloke AI的Discord服务器

感谢与贡献方式

感谢 chirper.ai 团队！感谢来自 [gpus.llm - utils.org](llm - utils) 的Clay！

很多人询问是否可以贡献。我喜欢提供模型并帮助他人，希望能有更多时间做这些事，也希望能开展新的项目，如微调/训练。

如果你有能力并愿意贡献，将不胜感激，这将帮助我提供更多模型，并开始新的AI项目。捐赠者将在所有AI/LLM/模型问题和请求上获得优先支持，访问私人Discord房间，以及其他福利。

Patreon: https://patreon.com/TheBlokeAI
Ko - Fi: https://ko - fi.com/TheBlokeAI

特别感谢：Aemon Algiz。

Patreon特别提及：Brandon Frisco, LangChain4j, Spiking Neurons AB, transmissions 11, Joseph William Delisle, Nitin Borwankar, Willem Michiel, Michael Dempsey, vamX, Jeffrey Morgan, zynix, jjj, Omer Bin Jawed, Sean Connelly, jinyuan sun, Jeromy Smith, Shadi, Pawan Osman, Chadd, Elijah Stavena, Illia Dulskyi, Sebastain Graf, Stephen Murray, terasurfer, Edmond Seymore, Celu Ramasamy, Mandus, Alex, biorpg, Ajan Kanaga, Clay Pascal, Raven Klaugh, ÈòøÊòé, K, ya boyyy, usrbinkat, Alicia Loh, John Villwock, ReadyPlayerEmma, Chris Smitley, Cap'n Zoog, fincy, GodLy, S_X, sidney chen, Cory Kujawski, OG, Mano Prime, AzureBlack, Pieter, Kalila, Spencer Kim, Tom X Nguyen, Stanislav Ovsiannikov, Michael Levine, Andrey, Trailburnt, Vadim, Enrico Ros, Talal Aujan, Brandon Phillips, Jack West, Eugene Pentland, Michael Davis, Will Dee, webtim, Jonathan Leane, Alps Aficionado, Rooh Singh, Tiffany J. Kim, theTransient, Luke @flexchar, Elle, Caitlyn Gatomon, Ari Malik, subjectnull, Johann - Peter Hartmann, Trenton Dambrowitz, Imad Khwaja, Asp the Wyvern, Emad Mostaque, Rainer Wilmers, Alexandros Triantafyllidis, Nicholas, Pedro Madruga, SuperWojo, Harry Royden McLaughlin, James Bentley, Olakabola, David Ziegler, Ai Maven, Jeff Scr