DiscoLM_German_7b_v1開源德語語言模型

首頁

Discolm German 7b V1 AWQ

由TheBloke開發

DiscoLM German 7B v1 是一個基於Mistral架構的7B參數德語語言模型，支持德語和英語，採用Apache-2.0許可證發佈。

大型語言模型

Transformers

支持多種語言開源協議:Apache-2.0 #德語對話優化 #多語言混合訓練 #ChatML格式支持

下載量 81

發布時間 : 1/18/2024

模型概述

該模型是一個專注於德語的語言模型，基於Mistral架構，經過微調優化，適用於德語文本生成和理解任務。

模型特點

德語優化

專門針對德語進行了優化和微調，提供更好的德語文本處理能力。

多語言支持

除了德語外，還支持英語，具備一定的跨語言能力。

高效推理

採用AWQ量化技術，在保持質量的同時提高推理速度。

模型能力

德語文本生成

英語文本生成

對話系統

文本理解

使用案例

內容創作

德語文章寫作

幫助用戶生成德語文章、博客等內容

生成流暢、符合語境的德語文本

客戶服務

德語客服機器人

用於德語市場的自動客服系統

能夠理解並回答德語客戶諮詢

🚀 DiscoLM German 7B v1 - AWQ

DiscoLM German 7B v1 - AWQ 是基於 DiscoLM German 7B v1 模型進行 AWQ 量化後的版本。AWQ 是一種高效、準確且快速的低比特權重量化方法，此模型能在特定環境下提供高效的推理服務，支持多種推理工具和平臺。

🚀 快速開始

環境準備

請確保你使用的是 text-generation-webui 的最新版本。強烈建議使用 text-generation-webui 的一鍵安裝程序，除非你確定自己知道如何手動安裝。

下載和使用步驟

點擊 Model tab。
在 Download custom model or LoRA 下，輸入 TheBloke/DiscoLM_German_7b_v1-AWQ。
點擊 Download。
模型將開始下載。下載完成後會顯示 "Done"。
在左上角，點擊 Model 旁邊的刷新圖標。
在 Model 下拉菜單中，選擇你剛剛下載的模型：DiscoLM_German_7b_v1-AWQ。
選擇 Loader: AutoAWQ。
點擊 Load，模型將加載並準備好使用。
如果你需要任何自定義設置，設置完成後點擊 Save settings for this model，然後在右上角點擊 Reload the Model。
準備好後，點擊 Text Generation 標籤並輸入提示以開始使用！

✨ 主要特性

多語言支持：支持德語和英語，在德語應用場景下進行了優化，同時在英語方面也保持了一定的流暢性，還擅長翻譯任務。
高效量化：採用 AWQ 量化方法，支持 4 位量化，相比 GPTQ，在基於 Transformer 的推理中速度更快，且質量相當或更好。
廣泛兼容：支持多種推理工具和平臺，如 Text Generation Webui、vLLM、Hugging Face Text Generation Inference (TGI)、Transformers 和 AutoAWQ。
多種提示格式：支持 ChatML 提示格式，還提供特殊的檢索格式以提高可控性和減少幻覺，同時支持結構化輸出/函數調用（實驗性功能）。

📦 安裝指南

從 text-generation-webui 下載

按照上述快速開始中的步驟進行操作。

從 Python 代碼使用 Transformers 進行推理

安裝必要的包

需要 Transformers 4.35.0 或更高版本。
需要 AutoAWQ 0.1.6 或更高版本。

pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0"

注意，如果你使用的是 PyTorch 2.0.1，上述 AutoAWQ 命令將自動將你升級到 PyTorch 2.1.0。

如果你使用的是 CUDA 11.8 並希望繼續使用 PyTorch 2.0.1，請運行以下命令：

pip3 install https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp310-cp310-linux_x86_64.whl

如果你在使用預構建的輪子安裝 AutoAWQ 時遇到問題，請從源代碼安裝：

pip3 uninstall -y autoawq
git clone https://github.com/casper-hansen/AutoAWQ
cd AutoAWQ
pip3 install .

💻 使用示例

基礎用法

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_name_or_path = "TheBloke/DiscoLM_German_7b_v1-AWQ"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    low_cpu_mem_usage=True,
    device_map="cuda:0"
)

# Using the text streamer to stream output one token at a time
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

# Convert prompt to tokens
tokens = tokenizer(
    prompt_template,
    return_tensors='pt'
).input_ids.cuda()

generation_params = {
    "do_sample": True,
    "temperature": 0.7,
    "top_p": 0.95,
    "top_k": 40,
    "max_new_tokens": 512,
    "repetition_penalty": 1.1
}

# Generate streamed output, visible one token at a time
generation_output = model.generate(
    tokens,
    streamer=streamer,
    **generation_params
)

# Generation without a streamer, which will include the prompt in the output
generation_output = model.generate(
    tokens,
    **generation_params
)

# Get the tokens from the output, decode them, print them
token_output = generation_output[0]
text_output = tokenizer.decode(token_output)
print("model.generate output: ", text_output)

# Inference is also possible via Transformers' pipeline
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    **generation_params
)

pipe_output = pipe(prompt_template)[0]['generated_text']
print("pipeline output: ", pipe_output)

高級用法

使用 vLLM 進行多用戶推理

from vllm import LLM, SamplingParams

prompts = [
    "Tell me about AI",
    "Write a story about llamas",
    "What is 291 - 150?",
    "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
]
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

prompts = [prompt_template.format(prompt=prompt) for prompt in prompts]

sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="TheBloke/DiscoLM_German_7b_v1-AWQ", quantization="awq", dtype="auto")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

使用 Hugging Face Text Generation Inference (TGI) 進行多用戶推理

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(prompt,
                                  max_new_tokens=128,
                                  do_sample=True,
                                  temperature=0.7,
                                  top_p=0.95,
                                  top_k=40,
                                  repetition_penalty=1.1)

print(f"Model output: ", response)

📚 詳細文檔

模型信息

屬性	詳情
模型創建者	Disco Research
模型名稱	DiscoLM German 7B v1
模型類型	mistral
基礎模型	DiscoResearch/DiscoLM_German_7b_v1
量化者	TheBloke
許可證	apache-2.0
支持語言	德語、英語
提示模板	`<

可用倉庫

提示模板

ChatML 提示模板

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

此提示可作為聊天模板使用，這意味著你可以使用 tokenizer.apply_chat_template() 方法格式化消息：

messages = [
    {"role": "system", "content": "Du bist ein hilfreicher Assistent."},
    {"role": "user", "content": "Wer bist du?"}
]
gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
model.generate(**gen_input)

在為生成任務對消息進行分詞時，調用 apply_chat_template() 時設置 add_generation_prompt=True。這將在你的提示後追加 <|im_start|>assistant\n，以確保模型繼續生成助手回覆。

檢索格式

你可以使用特殊的檢索格式來提高可控性並減少 RAG 應用中的幻覺（但其他更默認的格式也應該可以工作，這純粹是可選的）

示例：

### System:

Du bist ein hilfreicher Assistent. Für die folgende Aufgabe stehen dir zwischen den Tags BEGININPUT und ENDINPUT mehrere Quellen zur Verfügung. Metadaten zu den einzelnen Quellen wie Autor, URL o.ä. sind zwischen BEGINCONTEXT und ENDCONTEXT zu finden, danach folgt der Text der Quelle. Die eigentliche Aufgabe oder Frage ist zwischen BEGININSTRUCTION und ENDINSTRUCTION zu finden. Beantworte diese ausschließlich mit Informationen aus den gegebenen Quellen und gebe die Information zur genutzten Quelle  unter "Quelle:" an. Sollten die Quellen keine relevanten Informationen enthalten, antworte: "Mit den gegebenen Informationen ist diese Frage nicht zu beantworten."

### User Prompt:

BEGININPUT
BEGINCONTEXT
url: https://this.is.fake.news
time: 2089-09-01
ENDCONTEXT
Buxtehude ist die größte Stadt Deutschlands mit 96.56 Millionen Einwohnern.
ENDINPUT

BEGININSTRUCTION
Was ist die größte deutsche Stadt?
ENDINSTRUCTION

### Model Answer:

Die größte deutsche Stadt ist Buxtehude.

Quelle:
  url: https://this.is.fake.news
  time: 2089-09-01

函數調用

該模型還支持結構化輸出/函數調用，儘管這是一個非常實驗性的功能，效果可能因人而異。未來將對此進行改進。

模型將在函數調用前加上 <functioncall>，你可以在響應中使用 <functionresponse> 提供結果，用於多輪應用。

示例：

### System:

Du bist ein hilfreicher Assistent. Extrahiere alle Personen aus den Eingaben des Users.

Du hast Zugriff auf folgende Funktionen:

{'name': 'PersonList',
 'description': 'Extrahiere die Namen aller im Text vorkommenden Personen',
 'parameters': {'$defs': {'Person': {'description': 'Details über eine person',
    'properties': {'name': {'title': 'Name', 'type': 'string'},
     'job': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'title': 'Job'},
     'age': {'anyOf': [{'type': 'integer'}, {'type': 'null'}],
      'title': 'Age'}},
    'required': ['name', 'job', 'age'],
    'title': 'Person',
    'type': 'object'}},
  'properties': {'person_list': {'items': {'$ref': '#/$defs/Person'},
    'title': 'Person List',
    'type': 'array'}},
  'required': ['person_list'],
  'type': 'object'}}

### User Prompt:

Björn (25) und Jan sind die Gründer von ellamind.

### Model Answer:

<functioncall> {"name": "PersonList", "arguments": '{"person_list": ["{"name": "Björn", "job": "founder", "age": 25}, {"name": "Jan", "job": "founder", "age": null}]}'}

提供的文件和 AWQ 參數

目前僅發佈 128g GEMM 模型。正在積極考慮添加組大小為 32 的模型和 GEMV 內核模型。

模型以分片的 safetensors 文件形式發佈。

分支	比特數	組大小	AWQ 數據集	序列長度	大小
main	4	128	German Quad	4096	4.15 GB

兼容性

提供的文件經測試可與以下工具和平臺配合使用：

text-generation-webui，使用 Loader: AutoAWQ。
vLLM 版本 0.2.0 及更高版本。
Hugging Face Text Generation Inference (TGI) 版本 1.1.0 及更高版本。
Transformers 版本 4.35.0 及更高版本。
AutoAWQ 版本 0.1.1 及更高版本。

🔧 技術細節

AWQ 量化方法

AWQ 是一種高效、準確且快速的低比特權重量化方法，目前支持 4 位量化。與 GPTQ 相比，它在基於 Transformer 的推理中速度更快，且質量相當或更好。

模型訓練

DiscoLM German 7B v1 是基於 Mistral 的大語言模型，是 EM German 模型家族的繼任者。它在大量德語和英語指令數據集上進行訓練，先進行 SFT 微調階段，然後進行額外的 DPO 強化學習。

評估結果

初步的德語版 MT Bench 結果顯示，DiscoLM German 7B 在許多任務上的表現與 GPT-3.5-turbo 相差不大，甚至在推理類別中顯著優於它。不過，當前的基準測試還不能完全捕捉該模型的全部能力，尤其是母語使用者感知到的語言質量。

📄 許可證

本模型使用 apache-2.0 許可證。

⚠️ 重要提示

此模型可能會產生事實錯誤和冒犯性的輸出，不應依賴其產生事實準確的信息。
該模型在各種公共數據集上進行訓練。儘管在清理預訓練數據方面已付出巨大努力，但仍有可能生成有偏見或冒犯性的輸出，用戶有責任實施安全/審核層。請謹慎使用。

💡 使用建議

在使用 text-generation-webui 時，建議使用一鍵安裝程序以避免手動安裝可能出現的問題。
在使用 vLLM 進行推理時，請確保使用 vLLM 版本 0.2 或更高版本，並在作為服務器使用時傳遞 --quantization awq 參數。
在使用 Hugging Face Text Generation Inference (TGI) 時，請使用 TGI 版本 1.1.0 或更高版本。

Discord

如需進一步支持，以及參與有關這些模型和人工智能的討論，請加入我們的 TheBloke AI 的 Discord 服務器。

感謝與貢獻方式

感謝 chirper.ai 團隊！感謝來自 gpus.llm-utils.org 的 Clay！

很多人詢問是否可以進行貢獻。我喜歡提供模型並幫助他人，也希望能有更多時間投入其中，並開展新的項目，如微調/訓練。

如果你有能力且願意貢獻，將不勝感激，這將有助於我繼續提供更多模型，並開展新的人工智能項目。

捐贈者將在任何與人工智能/大語言模型/模型相關的問題和請求上獲得優先支持，訪問私人 Discord 房間，以及其他福利。

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

特別感謝：Aemon Algiz。

Patreon 特別提及：Michael Levine, 阿明, Trailburnt, Nikolai Manek, John Detwiler, Randy H, Will Dee, Sebastain Graf, NimbleBox.ai, Eugene Pentland, Emad Mostaque, Ai Maven, Jim Angel, Jeff Scroggin, Michael Davis, Manuel Alberto Morcote, Stephen Murray, Robert, Justin Joy, Luke @flexchar, Brandon Frisco, Elijah Stavena, S_X, Dan Guido, Undi ., Komninos Chatzipapas, Shadi, theTransient, Lone Striker, Raven Klaugh, jjj, Cap'n Zoog, Michel-Marie MAUDET (LINAGORA), Matthew Berman, David, Fen Risland, Omer Bin Jawed, Luke Pendergrass, Kalila, OG, Erik Bjäreholt, Rooh Singh, Joseph William Delisle, Dan Lewis, TL, John Villwock, AzureBlack, Brad, Pedro Madruga, Caitlyn Gatomon, K, jinyuan sun, Mano Prime, Alex, Jeffrey Morgan, Alicia Loh, Illia Dulskyi, Chadd, transmissions 11, fincy, Rainer Wilmers, ReadyPlayerEmma, knownsqashed, Mandus, biorpg, Deo Leter, Brandon Phillips, SuperWojo, Sean Connelly, Iucharbius, Jack West, Harry Royden McLaughlin, Nicholas, terasurfer, Vitor Caleffi, Duane Dunston, Johann-Peter Hartmann, David Ziegler, Olakabola, Ken Nordquist, Trenton Dambrowitz, Tom X Nguyen, Vadim, Ajan Kanaga, Leonard Tan, Clay Pascal, Alexandros Triantafyllidis, JM33133, Xule, vamX, ya boyyy, subjectnull, Talal Aujan, Alps Aficionado, wassieverse, Ari Malik, James Bentley, Woland, Spencer Kim, Michael Dempsey, Fred von Graf, Elle, zynix, William Richards, Stanislav Ovsiannikov, Edmond Seymore, Jonathan Leane, Martin Kemka, usrbinkat, Enrico Ros

感謝所有慷慨的贊助者和捐贈者！再次感謝 a16z 的慷慨資助。