open-llama-3b-v2-wizard-evol-instuct-v2-196k-AWQ開源模型

Home

Open Llama 3b V2 Wizard Evol Instuct V2 196k AWQ

Developed by TheBloke

這是一個基於Open Llama 3B V2架構的模型，使用WizardLM_evol_instruct_V2_196k數據集訓練而成，適用於指令跟隨任務。

大型語言模型

Transformers

EnglishOpen Source License:Apache-2.0 #指令微調 #小參數高效 #英文對話

Downloads 64

Release Time : 11/29/2023

Model Overview

該模型是基於Open Llama 3B V2架構訓練的指令跟隨模型，專門針對WizardLM的進化指令數據集進行了優化。

Model Features

指令優化

使用WizardLM的進化指令數據集訓練，優化了指令跟隨能力

高效推理

3B參數規模在保持性能的同時提供較快的推理速度

開放許可

採用Apache 2.0許可，允許商業和研究使用

Model Capabilities

文本生成

指令理解與執行

對話系統

問答系統

Use Cases

對話系統

智能助手

構建能夠理解複雜指令的對話助手

教育

教學輔助

用於生成教學內容和回答學生問題

🚀 Open Llama 3B V2 Wizard Evol Instuct V2 196K - AWQ

本項目為L的Open Llama 3B V2 Wizard Evol Instuct V2 196K模型提供了AWQ量化版本，可用於高效推理。

🚀 快速開始

環境準備

確保你使用的是最新版本的 text-generation-webui，強烈建議使用一鍵安裝程序，除非你確定自己知道如何手動安裝。

下載模型

點擊 Model tab。
在 Download custom model or LoRA 下輸入 TheBloke/open-llama-3b-v2-wizard-evol-instuct-v2-196k-AWQ。
點擊 Download，模型將開始下載，完成後會顯示 "Done"。

加載模型

在左上角點擊 Model 旁邊的刷新圖標。
在 Model 下拉菜單中選擇剛下載的模型 open-llama-3b-v2-wizard-evol-instuct-v2-196k-AWQ。
選擇 Loader: AutoAWQ。
點擊 Load，模型將加載並準備好使用。

自定義設置（可選）

如果你需要自定義設置，設置完成後點擊 Save settings for this model，然後在右上角點擊 Reload the Model。

開始使用

準備好後，點擊 Text Generation 標籤，輸入提示詞即可開始！

✨ 主要特性

AWQ量化優勢

AWQ 是一種高效、準確且極快的低比特權重量化方法，目前支持 4 比特量化。與 GPTQ 相比，它在基於 Transformers 的推理中速度更快，並且在質量上與最常用的 GPTQ 設置相當或更優。

多平臺支持

該模型支持多種平臺和工具，包括：

Text Generation Webui - 使用 Loader: AutoAWQ
vLLM - 僅支持 Llama 和 Mistral 模型
Hugging Face Text Generation Inference (TGI)
Transformers 版本 4.35.0 及更高版本
AutoAWQ - 可在 Python 代碼中使用

📦 安裝指南

從 text-generation-webui 安裝

按照上述快速開始部分的步驟進行操作。

從 Python 代碼安裝

安裝必要的包

需要 Transformers 4.35.0 或更高版本。
需要 AutoAWQ 0.1.6 或更高版本。

pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0"

如果你使用的是 PyTorch 2.0.1，上述 AutoAWQ 命令將自動將你升級到 PyTorch 2.1.0。如果你使用的是 CUDA 11.8 並希望繼續使用 PyTorch 2.0.1，請運行以下命令：

pip3 install https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp310-cp310-linux_x86_64.whl

如果你在使用預構建的輪子安裝 AutoAWQ 時遇到問題，可以從源代碼安裝：

pip3 uninstall -y autoawq
git clone https://github.com/casper-hansen/AutoAWQ
cd AutoAWQ
pip3 install .

💻 使用示例

基礎用法

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_name_or_path = "TheBloke/open-llama-3b-v2-wizard-evol-instuct-v2-196k-AWQ"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    low_cpu_mem_usage=True,
    device_map="cuda:0"
)

# Using the text streamer to stream output one token at a time
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "Tell me about AI"
prompt_template=f'''### HUMAN:
{prompt}

### RESPONSE:
'''

# Convert prompt to tokens
tokens = tokenizer(
    prompt_template,
    return_tensors='pt'
).input_ids.cuda()

generation_params = {
    "do_sample": True,
    "temperature": 0.7,
    "top_p": 0.95,
    "top_k": 40,
    "max_new_tokens": 512,
    "repetition_penalty": 1.1
}

# Generate streamed output, visible one token at a time
generation_output = model.generate(
    tokens,
    streamer=streamer,
    **generation_params
)

# Generation without a streamer, which will include the prompt in the output
generation_output = model.generate(
    tokens,
    **generation_params
)

# Get the tokens from the output, decode them, print them
token_output = generation_output[0]
text_output = tokenizer.decode(token_output)
print("model.generate output: ", text_output)

# Inference is also possible via Transformers' pipeline
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    **generation_params
)

pipe_output = pipe(prompt_template)[0]['generated_text']
print("pipeline output: ", pipe_output)

高級用法

使用 vLLM 進行多用戶推理

from vllm import LLM, SamplingParams

prompts = [
    "Tell me about AI",
    "Write a story about llamas",
    "What is 291 - 150?",
    "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
]
prompt_template=f'''### HUMAN:
{prompt}

### RESPONSE:
'''

prompts = [prompt_template.format(prompt=prompt) for prompt in prompts]

sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="TheBloke/open-llama-3b-v2-wizard-evol-instuct-v2-196k-AWQ", quantization="awq", dtype="auto")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

使用 Hugging Face Text Generation Inference (TGI) 進行多用戶推理

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''### HUMAN:
{prompt}

### RESPONSE:
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(prompt,
                                  max_new_tokens=128,
                                  do_sample=True,
                                  temperature=0.7,
                                  top_p=0.95,
                                  top_k=40,
                                  repetition_penalty=1.1)

print(f"Model output: ", response)

📚 詳細文檔

可用倉庫

提示模板

### HUMAN:
{prompt}

### RESPONSE:

提供的文件和 AWQ 參數

目前僅發佈 128g GEMM 模型，正在積極考慮添加 group_size 32 模型和 GEMV 內核模型。模型以分片的 safetensors 文件形式發佈。

分支	比特數	分組大小 (GS)	AWQ 數據集	序列長度	大小
main	4	64	VMware Open Instruct	2048	2.15 GB

兼容性

提供的文件經過測試，可與以下工具和版本兼容：

text-generation-webui 使用 Loader: AutoAWQ
vLLM 版本 0.2.0 及更高版本
Hugging Face Text Generation Inference (TGI) 版本 1.1.0 及更高版本
Transformers 版本 4.35.0 及更高版本
AutoAWQ 版本 0.1.1 及更高版本

🔧 技術細節

AWQ 量化方法

AWQ 是一種低比特權重量化方法，通過優化量化參數，在減少模型大小的同時保持了較高的推理性能和準確性。它在基於 Transformers 的模型上表現出色，能夠顯著提高推理速度。

模型訓練

該模型基於 Open Llama 3B V2 Wizard Evol Instuct V2 196K 進行訓練，使用了 WizardLM/WizardLM_evol_instruct_V2_196k 數據集進行了 1 個 epoch 的訓練。

📄 許可證

本項目使用 apache-2.0 許可證。

其他信息

Discord

如需進一步支持，或參與有關這些模型和人工智能的討論，請加入 TheBloke AI 的 Discord 服務器。

感謝與貢獻

感謝 chirper.ai 團隊和 gpus.llm-utils.org 的 Clay！如果你願意為項目做出貢獻，可以通過以下方式支持：

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

捐贈者將獲得優先支持，並可訪問私人 Discord 房間等福利。

原始模型信息

模型創建者: L
原始模型: Open Llama 3B V2 Wizard Evol Instuct V2 196K

評估結果

Open LLM Leaderboard 評估結果詳細結果可查看此處。

指標	值
平均值	36.33
ARC (25-shot)	41.81
HellaSwag (10-shot)	73.01
MMLU (5-shot)	26.36
TruthfulQA (0-shot)	38.99
Winogrande (5-shot)	66.69
GSM8K (5-shot)	1.9
DROP (3-shot)	5.57