Llama-3-8B-Instruct-64kオープンソース大規模言語モデル - 長文処理における超長コンテキスト対話のサポート

ホーム

Llama 3 8B Instruct 64k

MaziyarPanahiによって開発

winglian/Llama-3-8b-64k-PoSE を基に開発された8Bパラメータの大規模言語モデル。PoSE技術を用いてコンテキスト長を64kに拡張し、DPO微調整で最適化

大規模言語モデル

Transformers

英語#64k長文処理 #命令微調整最適化 #DPO強化学習

ダウンロード数 91

リリース時間 : 4/25/2024

モデル概要

これはMeta Llama-3アーキテクチャを基にした8Bパラメータの大規模言語モデルです。PoSE技術によりコンテキスト長を64kに拡張し、DPO微調整を施しており、長文生成や対話タスクに適しています。

モデル特徴

64k長文コンテキストサポート

PoSE技術を用いてコンテキスト長を8kから64kに拡張。長文書や複雑な対話処理に適している

DPO微調整最適化

Intel/orca_dpo_pairsデータセットを使用したDPO微調整により、モデルの応答品質を向上

効率的な推論

flash_attention_2とbfloat16推論をサポートし、推論効率を向上

モデル能力

長文生成

対話システム

命令追従

使用事例

対話システム

ロールプレイチャットボット

海賊チャットボットなどの特定のキャラクター特性を持つチャットボット構築に利用可能

キャラクター設定に沿った一貫性のある対話を生成可能

長文書処理

長文書要約

64kコンテキスト長の利点を活かして長文書を処理し要約を生成

🚀 MaziyarPanahi/Llama-3-8B-Instruct-64k

このモデルは、@winglian 氏の最新モデル winglian/Llama-3-8b-64k-PoSE をベースに作成されています。

このモデルは PoSE を使用して、Llama のコンテキスト長を 8k から 64k に拡張しています。@rope_theta: 500000.0。 RedPajama V1 データセットの 300M トークンに対して、6k - 8k トークンのデータを使用して PoSE を用いた継続事前学習を行いました。継続事前学習後に rope_theta を 2M に設定し、コンテキストを 64k を超えて拡張する可能性を高めました。これは RedPajama v1 データセットのサブセットで、6k - 8k のコンテキストのテキストを使用して学習されました。ランク 256 のランク安定化 LoRA を学習しました。WandB

🚀 クイックスタート

このモデルは、Hugging Face の transformers ライブラリで MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.3 をモデル名として使用することで利用できます。

💻 使用例

基本的な使用法

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from transformers import pipeline
import torch

model_id = "MaziyarPanahi/Llama-3-8B-Instruct-64k"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
    # attn_implementation="flash_attention_2"
)

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

streamer = TextStreamer(tokenizer)

pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    model_kwargs={"torch_dtype": torch.bfloat16},
    streamer=streamer
)

# Then you can use the pipeline to generate text.

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|im_end|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=8192,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
)
print(outputs[0]["generated_text"][len(prompt):])

📦 モデル情報

属性	詳情
ベースモデル	winglian/Llama-3-8b-64k-PoSE
ライブラリ名	transformers
タグ	axolotl, finetune, dpo, facebook, meta, pytorch, llama, llama-3, 64k, pose
言語	en
パイプラインタグ	text-generation
ライセンス	llama3
ライセンス名	llama3
ライセンスリンク	LICENSE
推論	false
モデル作成者	MaziyarPanahi
モデル名	Llama-3-8B-Instruct-64k
量子化者	MaziyarPanahi
データセット	Intel/orca_dpo_pairs