Mythalion-Kimiko-v2-AWQ開源模型 - 高效準確實現快速推理應用

首頁

Mythalion Kimiko V2 AWQ

由TheBloke開發

Mythalion Kimiko v2 - AWQ 是 nRuaif 創建的 Mythalion Kimiko v2 模型的 AWQ 量化版本，具有高效、準確和快速推理等特點。

大型語言模型

Transformers

開源協議:其他 #4比特量化 #高效推理 #多框架兼容

下載量 403

發布時間 : 12/14/2023

模型概述

該模型是 Mythalion Kimiko v2 的 AWQ 量化版本，支持 4 比特量化，適用於高效推理。

模型特點

高效推理

採用 AWQ 4 比特量化技術，相比 GPTQ 在基於 Transformer 的推理中速度更快。

多版本支持

提供 AWQ、GPTQ 和 GGUF 等多種量化版本的模型，適用於不同的推理場景。

廣泛兼容性

支持多種推理工具和框架，如 Text Generation Webui、vLLM、TGI 和 Transformers 等。

模型能力

文本生成

高效推理

使用案例

文本生成

AI 相關問答

回答關於人工智能的問題

故事創作

生成關於特定主題的故事

🚀 Mythalion Kimiko v2 - AWQ

Mythalion Kimiko v2 - AWQ 是 nRuaif 所創建模型 Mythalion Kimiko v2 的 AWQ 量化版本，具有高效、準確和快速推理等特點。

🚀 快速開始

本項目提供了 nRuaif 的 Mythalion Kimiko v2 的 AWQ 模型文件。這些文件使用了由 Massed Compute 慷慨提供的硬件進行量化。

關於 AWQ

AWQ 是一種高效、準確且極快的低比特權重量化方法，目前支持 4 比特量化。與 GPTQ 相比，它在基於 Transformer 的推理中速度更快，並且在質量上與最常用的 GPTQ 設置相當或更好。

AWQ 模型目前僅支持在 Linux 和 Windows 系統上使用 NVidia GPU 運行。macOS 用戶請使用 GGUF 模型。

它支持以下應用：

Text Generation Webui - 使用加載器：AutoAWQ
vLLM - 版本 0.2.2 或更高版本支持所有模型類型
Hugging Face Text Generation Inference (TGI)
Transformers 版本 4.35.0 及更高版本，適用於任何支持 Transformers 的代碼或客戶端
AutoAWQ - 用於 Python 代碼

✨ 主要特性

多版本支持：提供了 AWQ、GPTQ 和 GGUF 等多種量化版本的模型，適用於不同的推理場景。
廣泛兼容性：支持多種推理工具和框架，如 Text Generation Webui、vLLM、TGI 和 Transformers 等。

📦 安裝指南

在 text-generation-webui 中下載和使用此模型

請確保你使用的是 text-generation-webui 的最新版本。強烈建議使用 text-generation-webui 的一鍵安裝程序，除非你確定自己知道如何手動安裝。

點擊模型選項卡。
在 下載自定義模型或 LoRA 下，輸入 TheBloke/Mythalion-Kimiko-v2-AWQ。
點擊下載。
模型將開始下載。下載完成後會顯示“已完成”。
在左上角，點擊模型旁邊的刷新圖標。
在模型下拉菜單中，選擇你剛剛下載的模型：Mythalion-Kimiko-v2-AWQ。
選擇 加載器：AutoAWQ。
點擊加載，模型將加載並準備好使用。
如果你需要自定義設置，請進行設置，然後點擊右上角的 保存此模型的設置，接著點擊 重新加載模型。
準備好後，點擊 文本生成 選項卡並輸入提示以開始使用！

使用 Transformers 從 Python 代碼進行推理

安裝必要的包

需要：Transformers 4.35.0 或更高版本。
需要：AutoAWQ 0.1.6 或更高版本。

pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0"

注意，如果你使用的是 PyTorch 2.0.1，上述 AutoAWQ 命令將自動將你升級到 PyTorch 2.1.0。

如果你使用的是 CUDA 11.8 並希望繼續使用 PyTorch 2.0.1，請運行以下命令：

pip3 install https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp310-cp310-linux_x86_64.whl

如果你在使用預構建的輪子安裝 AutoAWQ 時遇到問題，請從源代碼安裝：

pip3 uninstall -y autoawq
git clone https://github.com/casper-hansen/AutoAWQ
cd AutoAWQ
pip3 install .

💻 使用示例

基礎用法

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_name_or_path = "TheBloke/Mythalion-Kimiko-v2-AWQ"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    low_cpu_mem_usage=True,
    device_map="cuda:0"
)

# Using the text streamer to stream output one token at a time
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "Tell me about AI"
prompt_template=f'''{prompt}
'''

# Convert prompt to tokens
tokens = tokenizer(
    prompt_template,
    return_tensors='pt'
).input_ids.cuda()

generation_params = {
    "do_sample": True,
    "temperature": 0.7,
    "top_p": 0.95,
    "top_k": 40,
    "max_new_tokens": 512,
    "repetition_penalty": 1.1
}

# Generate streamed output, visible one token at a time
generation_output = model.generate(
    tokens,
    streamer=streamer,
    **generation_params
)

# Generation without a streamer, which will include the prompt in the output
generation_output = model.generate(
    tokens,
    **generation_params
)

# Get the tokens from the output, decode them, print them
token_output = generation_output[0]
text_output = tokenizer.decode(token_output)
print("model.generate output: ", text_output)

# Inference is also possible via Transformers' pipeline
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    **generation_params
)

pipe_output = pipe(prompt_template)[0]['generated_text']
print("pipeline output: ", pipe_output)

高級用法

使用 vLLM 進行多用戶推理服務

from vllm import LLM, SamplingParams

prompts = [
    "Tell me about AI",
    "Write a story about llamas",
    "What is 291 - 150?",
    "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
]
prompt_template=f'''{prompt}
'''

prompts = [prompt_template.format(prompt=prompt) for prompt in prompts]

sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="TheBloke/Mythalion-Kimiko-v2-AWQ", quantization="awq", dtype="auto")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

使用 Hugging Face Text Generation Inference (TGI) 進行多用戶推理服務

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''{prompt}
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(prompt,
                                  max_new_tokens=128,
                                  do_sample=True,
                                  temperature=0.7,
                                  top_p=0.95,
                                  top_k=40,
                                  repetition_penalty=1.1)

print(f"Model output: ", response)

📚 詳細文檔

可用的倉庫

提示模板

{prompt}

提供的文件和 AWQ 參數

目前僅發佈 128g GEMM 模型。正在積極考慮添加組大小為 32 的模型和 GEMV 內核模型。

模型以分片的 safetensors 文件形式發佈。

分支	比特數	組大小	AWQ 數據集	序列長度	大小
main	4	128	VMware Open Instruct	4096	7.25 GB

兼容性

提供的文件經過測試，可與以下應用兼容：

text-generation-webui，使用 加載器：AutoAWQ。
vLLM 版本 0.2.0 及更高版本。
Hugging Face Text Generation Inference (TGI) 版本 1.1.0 及更高版本。
Transformers 版本 4.35.0 及更高版本。
AutoAWQ 版本 0.1.1 及更高版本。

📄 許可證

本項目使用其他許可證。

🔗 相關鏈接

模型創建者：nRuaif
原始模型：Mythalion Kimiko v2
Discord 服務器：TheBloke AI's Discord server
Patreon 頁面：https://patreon.com/TheBlokeAI
Ko-Fi 頁面：https://ko-fi.com/TheBlokeAI

🙏 致謝與貢獻

感謝 chirper.ai 團隊！感謝來自 gpus.llm-utils.org 的 Clay！

很多人詢問是否可以進行貢獻。我喜歡提供模型並幫助他人，也希望能夠花更多時間做這些事情，同時拓展到新的項目，如微調/訓練。

如果你有能力且願意貢獻，我將不勝感激，這將幫助我繼續提供更多模型，並開始新的 AI 項目。

捐贈者將在任何 AI/LLM/模型問題和請求上獲得優先支持，訪問私人 Discord 房間，以及其他福利。

特別感謝：Aemon Algiz。

Patreon 特別提及：Michael Levine, 阿明, Trailburnt, Nikolai Manek, John Detwiler, Randy H, Will Dee, Sebastain Graf, NimbleBox.ai, Eugene Pentland, Emad Mostaque, Ai Maven, Jim Angel, Jeff Scroggin, Michael Davis, Manuel Alberto Morcote, Stephen Murray, Robert, Justin Joy, Luke @flexchar, Brandon Frisco, Elijah Stavena, S_X, Dan Guido, Undi ., Komninos Chatzipapas, Shadi, theTransient, Lone Striker, Raven Klaugh, jjj, Cap'n Zoog, Michel-Marie MAUDET (LINAGORA), Matthew Berman, David, Fen Risland, Omer Bin Jawed, Luke Pendergrass, Kalila, OG, Erik Bjäreholt, Rooh Singh, Joseph William Delisle, Dan Lewis, TL, John Villwock, AzureBlack, Brad, Pedro Madruga, Caitlyn Gatomon, K, jinyuan sun, Mano Prime, Alex, Jeffrey Morgan, Alicia Loh, Illia Dulskyi, Chadd, transmissions 11, fincy, Rainer Wilmers, ReadyPlayerEmma, knownsqashed, Mandus, biorpg, Deo Leter, Brandon Phillips, SuperWojo, Sean Connelly, Iucharbius, Jack West, Harry Royden McLaughlin, Nicholas, terasurfer, Vitor Caleffi, Duane Dunston, Johann-Peter Hartmann, David Ziegler, Olakabola, Ken Nordquist, Trenton Dambrowitz, Tom X Nguyen, Vadim, Ajan Kanaga, Leonard Tan, Clay Pascal, Alexandros Triantafyllidis, JM33133, Xule, vamX, ya boyyy, subjectnull, Talal Aujan, Alps Aficionado, wassieverse, Ari Malik, James Bentley, Woland, Spencer Kim, Michael Dempsey, Fred von Graf, Elle, zynix, William Richards, Stanislav Ovsiannikov, Edmond Seymore, Jonathan Leane, Martin Kemka, usrbinkat, Enrico Ros

感謝所有慷慨的贊助者和捐贈者！再次感謝 a16z 的慷慨資助。