Meditron 7B - AWQオープンソース医療大規模モデル - 医療知識コーディングと臨床決定支援に特化

ホーム

Meditron 7B AWQ

TheBlokeによって開発

Meditron 7Bは、EPFL LLMチームによって開発された医療分野の大規模言語モデルで、Llama - 2 - 7Bをベースにさらに事前学習を行い、医療知識のエンコーディングと臨床決定支援に特化しています。

大規模言語モデル

Transformers

英語#医療質問応答 #高効率推論 #4ビット量子化

ダウンロード数 38.22k

リリース時間 : 11/30/2023

モデル概要

これは医療分野向けに最適化された7Bパラメータの大規模言語モデルで、AWQ量子化を施すことで高効率推論に利用できます。このモデルは医学試験の質問応答、診断支援などの分野で潜在的な応用価値があります。

モデル特徴

高効率量子化

AWQ 4ビット量子化技術を採用し、モデルの品質を維持しながら推論速度を大幅に向上させます

医療分野最適化

医学データでさらに事前学習を行い、医療知識のエンコーディング能力を強化しました

広範な互換性

vLLM、Hugging Face TGI、Transformersなどの複数の推論フレームワークをサポートします

モデル能力

医学質問応答生成

医療情報検索

診断支援

健康相談

医学試験問題回答

使用事例

医療健康

医学試験質問応答

医学試験関連の質問に回答します

健康カテゴリのTruthfulQA評価で27.3%の正解率を達成しました

診断支援

鑑別診断の提案を提供します

健康情報検索

疾病の症状、原因、治療情報を検索します

🚀 Meditron 7B - AWQ

このプロジェクトは、EPFL LLM TeamによるMeditron 7BモデルのAWQ量子化バージョンを提供します。AWQは高速で高精度な低ビット量子化手法で、GPU推論に最適化されています。

🚀 クイックスタート

このセクションでは、Meditron 7B - AWQモデルの基本的な情報と、各種環境での使用方法を紹介します。

✨ 主な機能

AWQ量子化：高速で高精度な低ビット量子化手法を使用し、GPU推論を最適化。
複数の推論環境対応：Text Generation Webui、vLLM、Hugging Face Text Generation Inference (TGI)、Transformersなどの主要な推論環境で動作。
複数のモデル形式提供：AWQ、GPTQ、GGUFなどのモデル形式を提供。

📦 インストール

text-generation-webuiでのインストール

text-generation-webuiの最新バージョンを使用していることを確認します。
以下の手順でモデルをダウンロードします。
1. Model tabをクリックします。
2. Download custom model or LoRAの下に、TheBloke/meditron-7B-AWQを入力します。
3. Downloadをクリックします。
4. モデルのダウンロードが開始され、完了すると"Done"と表示されます。
左上のModelの横にあるリフレッシュアイコンをクリックします。
Modelのドロップダウンから、先ほどダウンロードしたmeditron-7B-AWQを選択します。
Loader: AutoAWQを選択します。
Loadをクリックすると、モデルがロードされ、使用可能になります。
カスタム設定が必要な場合は、設定を行ってから右上のSave settings for this modelをクリックし、続いてReload the Modelをクリックします。
準備ができたら、Text Generationタブをクリックし、プロンプトを入力して使用を開始します。

Pythonコードからのインストール

必要なパッケージをインストールします。

pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0"

PyTorch 2.0.1を使用している場合は、上記のAutoAWQコマンドにより自動的にPyTorch 2.1.0にアップグレードされます。CUDA 11.8を使用しており、PyTorch 2.0.1を引き続き使用したい場合は、以下のコマンドを実行します。

pip3 install https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp310-cp310-linux_x86_64.whl

AutoAWQの事前構築済みホイールを使用してインストールできない場合は、ソースからインストールします。

pip3 uninstall -y autoawq
git clone https://github.com/casper-hansen/AutoAWQ
cd AutoAWQ
pip3 install .

💻 使用例

基本的な使用法

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_name_or_path = "TheBloke/meditron-7B-AWQ"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    low_cpu_mem_usage=True,
    device_map="cuda:0"
)

# Using the text streamer to stream output one token at a time
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

# Convert prompt to tokens
tokens = tokenizer(
    prompt_template,
    return_tensors='pt'
).input_ids.cuda()

generation_params = {
    "do_sample": True,
    "temperature": 0.7,
    "top_p": 0.95,
    "top_k": 40,
    "max_new_tokens": 512,
    "repetition_penalty": 1.1
}

# Generate streamed output, visible one token at a time
generation_output = model.generate(
    tokens,
    streamer=streamer,
    **generation_params
)

# Generation without a streamer, which will include the prompt in the output
generation_output = model.generate(
    tokens,
    **generation_params
)

# Get the tokens from the output, decode them, print them
token_output = generation_output[0]
text_output = tokenizer.decode(token_output)
print("model.generate output: ", text_output)

# Inference is also possible via Transformers' pipeline
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    **generation_params
)

pipe_output = pipe(prompt_template)[0]['generated_text']
print("pipeline output: ", pipe_output)

高度な使用法

vLLMを使用したマルチユーザ推論サーバー

vLLMのインストールと使用方法については、こちらのドキュメントを参照してください。

vLLMバージョン0.2以上を使用していることを確認します。
vLLMをサーバーとして使用する場合は、--quantization awqパラメータを渡します。

python3 -m vllm.entrypoints.api_server --model TheBloke/meditron-7B-AWQ --quantization awq --dtype auto

PythonコードからvLLMを使用する場合は、quantization=awqを設定します。

from vllm import LLM, SamplingParams

prompts = [
    "Tell me about AI",
    "Write a story about llamas",
    "What is 291 - 150?",
    "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
]
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

prompts = [prompt_template.format(prompt=prompt) for prompt in prompts]

sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="TheBloke/meditron-7B-AWQ", quantization="awq", dtype="auto")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Hugging Face Text Generation Inference (TGI)を使用したマルチユーザ推論サーバー

TGIバージョン1.1.0以上を使用します。公式のDockerコンテナは、ghcr.io/huggingface/text-generation-inference:1.1.0です。

--model-id TheBloke/meditron-7B-AWQ --port 3000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096

TGIとやり取りするPythonコードの例です。（huggingface-hub 0.17.0以上が必要です）

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''

client = InferenceClient(endpoint_url)
response = client.text_generation(prompt,
                                  max_new_tokens=128,
                                  do_sample=True,
                                  temperature=0.7,
                                  top_p=0.95,
                                  top_k=40,
                                  repetition_penalty=1.1)

print(f"Model output: ", response)

📚 ドキュメント

プロンプトテンプレート

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

提供されるファイルとAWQパラメータ

現在は、128g GEMMモデルのみをリリースしています。group_size 32モデルとGEMVカーネルモデルの追加は、積極的に検討されています。モデルは、シャーディングされたsafetensorsファイルとしてリリースされます。

ブランチ	ビット数	GS	AWQデータセット	シーケンス長	サイズ
main	4	128	Medical Medaow WikiDoc	4096	3.89 GB

互換性

提供されるファイルは、以下の環境で動作することがテストされています。

text-generation-webui（Loader: AutoAWQを使用）
vLLMバージョン0.2.0以上
Hugging Face Text Generation Inference (TGI)バージョン1.1.0以上
Transformersバージョン4.35.0以上
AutoAWQバージョン0.1.1以上

🔧 技術詳細

Meditron 7Bは、Llama-2-7Bをベースに、医療分野のコーパスで継続事前学習された70億パラメータのモデルです。AWQ量子化手法を使用することで、GPU推論の高速化と高精度化を実現しています。

📄 ライセンス

このモデルは、LLAMA 2 COMMUNITY LICENSE AGREEMENTに基づいて提供されています。コードは、APACHE 2.0 LICENSEに基づいています。

追加情報

利用可能なリポジトリ

Discord

これらのモデルやAI全般に関するさらなるサポートや議論に参加するには、TheBloke AIのDiscordサーバーに参加してください。

貢献方法

多くの方から貢献の可否をお尋ねいただいています。私はモデルを提供し、人々を助けることが好きで、さらに多くの時間を費やし、微調整/トレーニングなどの新しいプロジェクトにも進出したいと思っています。貢献いただける方は、心から感謝しています。これにより、より多くのモデルを提供し、新しいAIプロジェクトの開発を開始することができます。寄付者は、すべてのAI/LLM/モデルに関する質問やリクエストに対する優先サポート、プライベートDiscordルームへのアクセス、その他の特典を受けることができます。

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

特別な感謝: Aemon Algiz Patreonでの特別な言及: Brandon Frisco, LangChain4j, Spiking Neurons AB, transmissions 11, Joseph William Delisle, Nitin Borwankar, Willem Michiel, Michael Dempsey, vamX, Jeffrey Morgan, zynix, jjj, Omer Bin Jawed, Sean Connelly, jinyuan sun, Jeromy Smith, Shadi, Pawan Osman, Chadd, Elijah Stavena, Illia Dulskyi, Sebastain Graf, Stephen Murray, terasurfer, Edmond Seymore, Celu Ramasamy, Mandus, Alex, biorpg, Ajan Kanaga, Clay Pascal, Raven Klaugh, 阿明, K, ya boyyy, usrbinkat, Alicia Loh, John Villwock, ReadyPlayerEmma, Chris Smitley, Cap'n Zoog, fincy, GodLy, S_X, sidney chen, Cory Kujawski, OG, Mano Prime, AzureBlack, Pieter, Kalila, Spencer Kim, Tom X Nguyen, Stanislav Ovsiannikov, Michael Levine, Andrey, Trailburnt, Vadim, Enrico Ros, Talal Aujan, Brandon Phillips, Jack West, Eugene Pentland, Michael Davis, Will Dee, webtim, Jonathan Leane, Alps Aficionado, Rooh Singh, Tiffany J. Kim, theTransient, Luke @flexchar, Elle, Caitlyn Gatomon, Ari Malik, subjectnull, Johann-Peter Hartmann, Trenton Dambrowitz, Imad Khwaja, Asp the Wyvern, Emad Mostaque, Rainer Wilmers, Alexandros Triantafyllidis, Nicholas, Pedro Madruga, SuperWojo, Harry Royden McLaughlin, James Bentley, Olakabola, David Ziegler, Ai Maven, Jeff Scroggin, Nikolai Manek, Deo Leter, Matthew Berman, Fen Risland, Ken Nordquist, Manuel Alberto Morcote, Luke Pendergrass, TL, Fred von Graf, Randy H, Dan Guido, NimbleBox.ai, Vitor Caleffi, Gabriel Tamborski, knownsqashed, Lone Striker, Erik Bjäreholt, John Detwiler, Leonard Tan, Iucharbius

すべての寛大なパトロンと寄付者の皆様、ありがとうございます。また、a16zによる寛大な助成にも、改めて感謝申し上げます。