オープンソースの多モーダルチャットボット「llava-v1.5-13B-AWQ」 - 画像対話のインタラクティブ体験をサポート

ホーム

Llava V1.5 13B AWQ

TheBlokeによって開発

LLaVAはオープンソースのマルチモーダルチャットボットで、GPTが生成したマルチモーダル命令追従データを用いてLLaMA/Vicunaをファインチューニングしてトレーニングされています。

テキスト生成画像

Transformers

#マルチモーダル対話 #命令追従 #学術VQA

ダウンロード数 141

リリース時間 : 10/15/2023

モデル概要

LLaVAはトランスフォーマーアーキテクチャに基づく自己回帰型言語モデルで、画像に関連するテキスト内容を理解し生成することができます。

モデル特徴

マルチモーダル理解

画像とテキスト入力を同時に処理し、両者の関係を理解できる

命令追従

複雑なマルチモーダル命令に従ってタスクを実行できる

オープンソース

モデルは完全にオープンソースで、研究や商業利用が可能

モデル能力

視覚的質問応答

画像説明生成

マルチモーダル対話

命令追従

使用事例

研究

マルチモーダルモデル研究

大規模マルチモーダルモデルの行動と能力を研究するために使用

教育

視覚的補助学習

画像を通じて複雑な概念を理解するのを学生に支援

🚀 Llava v1.5 13B - AWQ

このモデルは、Haotian Liu氏によるLlava v1.5 13BをAWQ方式で量子化したものです。AWQは高速かつ高精度な低ビット量子化手法で、vLLMやTGIなどのサーバーでの推論に対応しています。

🚀 クイックスタート

このセクションでは、このモデルを使用するための基本的な手順を説明します。

✨ 主な機能

AWQ量子化：高速かつ高精度な低ビット量子化手法を採用。
複数サーバー対応：vLLMやTGIなどのサーバーでの推論に対応。
多様なデータセットでの学習：様々なデータセットを用いて学習されている。

📦 インストール

vLLMを使用する場合

python3 python -m vllm.entrypoints.api_server --model TheBloke/llava-v1.5-13B-AWQ --quantization awq --dtype half

Pythonコードから使用する場合

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="TheBloke/llava-v1.5-13B-AWQ", quantization="awq", dtype="half")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

TGIを使用する場合

--model-id TheBloke/llava-v1.5-13B-AWQ --port 3000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096

PythonコードからTGIとやり取りする場合

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''{prompt}

'''

client = InferenceClient(endpoint_url)
response = client.text_generation(prompt,
                                  max_new_tokens=128,
                                  do_sample=True,
                                  temperature=0.7,
                                  top_p=0.95,
                                  top_k=40,
                                  repetition_penalty=1.1)

print(f"Model output: {response}")

PythonコードからAWQモデルを使用する場合

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_name_or_path = "TheBloke/llava-v1.5-13B-AWQ"

# Load model
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=True,
                                          trust_remote_code=False, safetensors=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False)

prompt = "Tell me about AI"
prompt_template=f'''{prompt}

'''

print("\n\n*** Generate:")

tokens = tokenizer(
    prompt_template,
    return_tensors='pt'
).input_ids.cuda()

# Generate output
generation_output = model.generate(
    tokens,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    max_new_tokens=512
)

print("Output: ", tokenizer.decode(generation_output[0]))

💻 使用例

基本的な使用法

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="TheBloke/llava-v1.5-13B-AWQ", quantization="awq", dtype="half")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

高度な使用法

from huggingface_hub import InferenceClient

endpoint_url = "https://your-endpoint-url-here"

prompt = "Tell me about AI"
prompt_template=f'''{prompt}

'''

client = InferenceClient(endpoint_url)
response = client.text_generation(prompt,
                                  max_new_tokens=128,
                                  do_sample=True,
                                  temperature=0.7,
                                  top_p=0.95,
                                  top_k=40,
                                  repetition_penalty=1.1)

print(f"Model output: {response}")

📚 ドキュメント

プロンプトテンプレート

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
USER: <image>{prompt}
ASSISTANT:

提供されるファイルとAWQパラメータ

ブランチ	ビット数	GS	AWQデータセット	シーケンス長	サイズ
main	4	128	wikitext	4096	7.25 GB

🔧 技術詳細

AWQについて

AWQは、効率的で高精度かつ高速な低ビット量子化手法で、現在は4ビット量子化をサポートしています。GPTQと比較して、Transformerベースの推論が高速です。

互換性

提供されるファイルは、以下のものと互換性があることがテストされています。

📄 ライセンス

Discord

これらのモデルやAI全般に関するさらなるサポートや議論に参加するには、以下のDiscordサーバーに参加してください。 TheBloke AI's Discord server

謝辞と貢献方法

このモデルの開発に貢献してくれた皆様に感謝いたします。また、貢献いただける方は、以下の方法で貢献いただけます。

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

貢献いただいた方には、AI/LLM/モデルに関する質問やリクエストに対する優先サポート、プライベートDiscordルームへのアクセス、その他の特典を提供いたします。

元のモデルカード: Haotian Liu's Llava v1.5 13B

モデル詳細

属性	詳情
モデルタイプ	LLaVAは、LLaMA/VicunaをGPT生成のマルチモーダル命令追従データでファインチューニングして学習されたオープンソースチャットボットです。トランスフォーマーアーキテクチャに基づく自己回帰型言語モデルです。
モデル作成日	LLaVA-v1.5-13Bは2023年9月に学習されました。
詳細情報の論文またはリソース	https://llava-vl.github.io/