open-llama-3b-v2-wizard-evol-instuct-v2-196k-AWQ开源模型

首页

Open Llama 3b V2 Wizard Evol Instuct V2 196k AWQ

由 TheBloke 开发

这是一个基于Open Llama 3B V2架构的模型，使用WizardLM_evol_instruct_V2_196k数据集训练而成，适用于指令跟随任务。

大型语言模型

Transformers

英语开源协议:Apache-2.0 #指令微调 #小参数高效 #英文对话

下载量 64

发布时间 : 11/29/2023

模型简介

该模型是基于Open Llama 3B V2架构训练的指令跟随模型，专门针对WizardLM的进化指令数据集进行了优化。

模型特点

指令优化

使用WizardLM的进化指令数据集训练，优化了指令跟随能力

高效推理

3B参数规模在保持性能的同时提供较快的推理速度

开放许可

采用Apache 2.0许可，允许商业和研究使用

模型能力

文本生成

指令理解与执行

对话系统

问答系统

使用案例

对话系统

智能助手

构建能够理解复杂指令的对话助手

教育

教学辅助

用于生成教学内容和回答学生问题

🚀 Open Llama 3B V2 Wizard Evol Instuct V2 196K - AWQ

本项目为L的Open Llama 3B V2 Wizard Evol Instuct V2 196K模型提供了AWQ量化版本，可用于高效推理。

🚀 快速开始

环境准备

确保你使用的是最新版本的 text-generation-webui，强烈建议使用一键安装程序，除非你确定自己知道如何手动安装。

下载模型

点击 Model tab。
在 Download custom model or LoRA 下输入 TheBloke/open-llama-3b-v2-wizard-evol-instuct-v2-196k-AWQ。
点击 Download，模型将开始下载，完成后会显示 "Done"。

加载模型

在左上角点击 Model 旁边的刷新图标。
在 Model 下拉菜单中选择刚下载的模型 open-llama-3b-v2-wizard-evol-instuct-v2-196k-AWQ。
选择 Loader: AutoAWQ。
点击 Load，模型将加载并准备好使用。

自定义设置（可选）

如果你需要自定义设置，设置完成后点击 Save settings for this model，然后在右上角点击 Reload the Model。

开始使用

准备好后，点击 Text Generation 标签，输入提示词即可开始！

✨ 主要特性

AWQ量化优势

AWQ 是一种高效、准确且极快的低比特权重量化方法，目前支持 4 比特量化。与 GPTQ 相比，它在基于 Transformers 的推理中速度更快，并且在质量上与最常用的 GPTQ 设置相当或更优。

多平台支持

该模型支持多种平台和工具，包括：

Text Generation Webui - 使用 Loader: AutoAWQ
vLLM - 仅支持 Llama 和 Mistral 模型
Hugging Face Text Generation Inference (TGI)
Transformers 版本 4.35.0 及更高版本
AutoAWQ - 可在 Python 代码中使用

📦 安装指南

从 text-generation-webui 安装

按照上述快速开始部分的步骤进行操作。

从 Python 代码安装

安装必要的包

需要 Transformers 4.35.0 或更高版本。
需要 AutoAWQ 0.1.6 或更高版本。

pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0"

如果你使用的是 PyTorch 2.0.1，上述 AutoAWQ 命令将自动将你升级到 PyTorch 2.1.0。如果你使用的是 CUDA 11.8 并希望继续使用 PyTorch 2.0.1，请运行以下命令：

pip3 install https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp310-cp310-linux_x86_64.whl

如果你在使用预构建的轮子安装 AutoAWQ 时遇到问题，可以从源代码安装：

pip3 uninstall -y autoawq
git clone https://github.com/casper-hansen/AutoAWQ
cd AutoAWQ
pip3 install .

💻 使用示例

基础用法

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_name_or_path = "TheBloke/open-llama-3b-v2-wizard-evol-instuct-v2-196k-AWQ"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    low_cpu_mem_usage=True,
    device_map="cuda:0"
)

# Using the text streamer to stream output one token at a time
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "Tell me about AI"
prompt_template=f'''### HUMAN:
{prompt}

### RESPONSE:
'''

# Convert prompt to tokens
tokens = tokenizer(
    prompt_template,
    return_tensors='pt'
).input_ids.cuda()

generation_params = {
    "do_sample": True,
    "temperature": 0.7,
    "top_p": 0.95,
    "top_k": 40,
    "max_new_tokens": 512,
    "repetition_penalty": 1.1
}

# Generate streamed output, visible one token at a time
generation_output = model.generate(
    tokens,
    streamer=streamer,
    **generation_params
)

# Generation without a streamer, which will include the prompt in the output
generation_output = model.generate(
    tokens,
    **generation_params
)

# Get the tokens from the output, decode them, print them
token_output = generation_output[0]
text_output = tokenizer.decode(token_output)
print("model.generate output: ", text_output)

# Inference is also possible via Transformers' pipeline
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    **generation_params
)

pipe_output = pipe(prompt_template)[0]['generated_text']
print("pipeline output: ", pipe_output)

高级用法

使用 vLLM 进行多用户推理

from vllm import LLM, SamplingParams

prompts = [
    "Tell me about AI",
    "Write a story about llamas",
    "What is 291 - 150?",
    "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
]
prompt_template=f'''### HUMAN:
{prompt}

### RESPONSE:
'''

prompts = [prompt_template.format(prompt=prompt) for prompt in prompts]

sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="TheBloke/open-llama-3b-v2-wizard-evol-instuct-v2-196k-AWQ", quantization="awq", dtype="auto")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

使用 Hugging Face Text Generation Inference (TGI) 进行多用户推理

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''### HUMAN:
{prompt}

### RESPONSE:
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(prompt,
                                  max_new_tokens=128,
                                  do_sample=True,
                                  temperature=0.7,
                                  top_p=0.95,
                                  top_k=40,
                                  repetition_penalty=1.1)

print(f"Model output: ", response)

📚 详细文档

可用仓库

提示模板

### HUMAN:
{prompt}

### RESPONSE:

提供的文件和 AWQ 参数

目前仅发布 128g GEMM 模型，正在积极考虑添加 group_size 32 模型和 GEMV 内核模型。模型以分片的 safetensors 文件形式发布。

分支	比特数	分组大小 (GS)	AWQ 数据集	序列长度	大小
main	4	64	VMware Open Instruct	2048	2.15 GB

兼容性

提供的文件经过测试，可与以下工具和版本兼容：

text-generation-webui 使用 Loader: AutoAWQ
vLLM 版本 0.2.0 及更高版本
Hugging Face Text Generation Inference (TGI) 版本 1.1.0 及更高版本
Transformers 版本 4.35.0 及更高版本
AutoAWQ 版本 0.1.1 及更高版本

🔧 技术细节

AWQ 量化方法

AWQ 是一种低比特权重量化方法，通过优化量化参数，在减少模型大小的同时保持了较高的推理性能和准确性。它在基于 Transformers 的模型上表现出色，能够显著提高推理速度。

模型训练

该模型基于 Open Llama 3B V2 Wizard Evol Instuct V2 196K 进行训练，使用了 WizardLM/WizardLM_evol_instruct_V2_196k 数据集进行了 1 个 epoch 的训练。

📄 许可证

本项目使用 apache-2.0 许可证。

其他信息

Discord

如需进一步支持，或参与有关这些模型和人工智能的讨论，请加入 TheBloke AI 的 Discord 服务器。

感谢与贡献

感谢 chirper.ai 团队和 gpus.llm-utils.org 的 Clay！如果你愿意为项目做出贡献，可以通过以下方式支持：

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

捐赠者将获得优先支持，并可访问私人 Discord 房间等福利。

原始模型信息

模型创建者: L
原始模型: Open Llama 3B V2 Wizard Evol Instuct V2 196K

评估结果

Open LLM Leaderboard 评估结果详细结果可查看此处。

指标	值
平均值	36.33
ARC (25-shot)	41.81
HellaSwag (10-shot)	73.01
MMLU (5-shot)	26.36
TruthfulQA (0-shot)	38.99
Winogrande (5-shot)	66.69
GSM8K (5-shot)	1.9
DROP (3-shot)	5.57