Llama3 ChatQA 2 8B_カテゴリ| AIbaseモデルライブラリ

ホーム

Llama3 ChatQA 2 8B

nvidiaによって開発

Llama-3ベースモデルを開発した128K長文脈大規模言語モデルで、RAGと長文理解能力の向上に焦点を当てています

大規模言語モデル

PyTorch

英語#128K長文理解 #検索拡張生成(RAG)#マルチターン対話最適化

ダウンロード数 437

リリース時間 : 8/28/2024

モデル概要

オープンソース大規模言語モデルと専有モデルの間にある長文脈理解と検索拡張生成(RAG)能力のギャップを埋め、128Kトークンの超長文脈処理をサポートします

モデル特徴

128K超長文脈

3段階の微調整プロセスによりコンテキストウィンドウを8Kから128Kトークンに拡張

強化されたRAG能力

検索拡張生成シナリオ向けに特別に最適化され、GPT-4-Turboレベルの性能に近い

多段階トレーニング手法

改良版ChatQA-1.5論文のトレーニング手法を採用

2バージョン選択

8Bと70Bの2つのパラメータサイズバージョンを提供し、異なるニーズに対応

モデル能力

長文理解

検索拡張生成

指示追従

ドキュメントQA

マルチターン対話

使用事例

金融分析

財務報告データ分析

長文の財務報告から主要指標とトレンド分析を抽出

純利益変化などの主要財務指標を正確に識別可能

知識QA

長文ドキュメントQA

超長文技術文書や研究論文に基づく専門的な質問に回答

32Kトークン以上の長文QAで優れた性能を発揮

🚀 Llama3-ChatQA-2

Llama3-ChatQA-2は、128Kの長文脈モデルのセットです。オープンソースの大規模言語モデル（LLM）と、GPT-4-Turboなどの主要な独占モデルとの間の長文脈理解と検索拡張生成（RAG）機能のギャップを埋めます。

🚀 クイックスタート

Llama3-ChatQA-2は、ChatQA-1.5論文の改良された学習方法を用いて開発され、Llama-3ベースモデルをベースに構築されています。具体的には、Llama-3ベースモデルを継続学習させて、コンテキストウィンドウを8Kトークンから128Kトークンに拡張し、三段階の命令微調整プロセスを行って、モデルの命令追従能力、RAG性能、および長文脈理解能力を向上させました。Llama3-ChatQA-2には、Llama3-ChatQA-2-8BとLlama3-ChatQA-2-70Bの2つのバリアントがあります。両方のモデルは元々Megatron-LMを使用して学習され、チェックポイントはHugging Face形式に変換されています。ChatQA 2の詳細については、ウェブサイトをご確認ください！

✨ 主な機能

モデルの概要

プロパティ	詳細
モデルタイプ	Llama3-ChatQA-2は、128Kの長文脈モデルのセットで、長文脈理解と検索拡張生成（RAG）機能に優れています。
学習データ	Training Data を使用して学習されました。

その他のリソース

Llama3-ChatQA-2-70B 評価データ学習データウェブサイト論文

ベンチマーク結果の概要

ChatQA 2を、短文脈RAGベンチマーク（ChatRAG）（4Kトークン以内）、SCROLLSとLongBenchの長文脈タスク（32Kトークン以内）、およびInfiniteBenchの超長文脈タスク（100Kトークンを超える）で評価しました。結果は以下の通りです。

Example Image

なお、ChatQA-2はLlama-3ベースモデルをベースに構築されています。

プロンプトフォーマット

私たちは、以下に示すプロンプトフォーマットを強くお勧めします。

コンテキストが利用可能な場合

System: {System}

{Context}

User: {Question}

Assistant: {Response}

User: {Question}

Assistant:

コンテキストが利用不可能な場合

System: {System}

User: {Question}

Assistant: {Response}

User: {Question}

Assistant:

両シナリオにおけるシステムターンの内容（すなわち、{System}）は以下の通りです。

This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context.

なお、私たちのChatQA-2モデルは、文書や検索されたコンテキストなどのコンテキストを伴う能力に最適化されています。

💻 使用例

基本的な使用法

これは、文書全体をモデルに収めることができるシナリオに適用できます。そのため、文書に対して検索を実行する必要はありません。

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "nvidia/Llama3-ChatQA-2-8B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

messages = [
    {"role": "user", "content": "what is the percentage change of the net income from Q4 FY23 to Q4 FY24?"}
]

document = """NVIDIA (NASDAQ: NVDA) today reported revenue for the fourth quarter ended January 28, 2024, of $22.1 billion, up 22% from the previous quarter and up 265% from a year ago.\nFor the quarter, GAAP earnings per diluted share was $4.93, up 33% from the previous quarter and up 765% from a year ago. Non-GAAP earnings per diluted share was $5.16, up 28% from the previous quarter and up 486% from a year ago.\nQ4 Fiscal 2024 Summary\nGAAP\n| $ in millions, except earnings per share | Q4 FY24 | Q3 FY24 | Q4 FY23 | Q/Q | Y/Y |\n| Revenue | $22,103 | $18,120 | $6,051 | Up 22% | Up 265% |\n| Gross margin | 76.0% | 74.0% | 63.3% | Up 2.0 pts | Up 12.7 pts |\n| Operating expenses | $3,176 | $2,983 | $2,576 | Up 6% | Up 23% |\n| Operating income | $13,615 | $10,417 | $1,257 | Up 31% | Up 983% |\n| Net income | $12,285 | $9,243 | $1,414 | Up 33% | Up 769% |\n| Diluted earnings per share | $4.93 | $3.71 | $0.57 | Up 33% | Up 765% |"""

def get_formatted_input(messages, context):
    system = "System: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context."
    instruction = "Please give a full and complete answer for the question."

    for item in messages:
        if item['role'] == "user":
            ## only apply this instruction for the first user turn
            item['content'] = instruction + " " + item['content']
            break

    conversation = '\n\n'.join(["User: " + item["content"] if item["role"] == "user" else "Assistant: " + item["content"] for item in messages]) + "\n\nAssistant:"
    formatted_input = system + "\n\n" + context + "\n\n" + conversation
    
    return formatted_input

formatted_input = get_formatted_input(messages, document)
tokenized_prompt = tokenizer(tokenizer.bos_token + formatted_input, return_tensors="pt").to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(input_ids=tokenized_prompt.input_ids, attention_mask=tokenized_prompt.attention_mask, max_new_tokens=128, eos_token_id=terminators)

response = outputs[0][tokenized_prompt.input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

高度な使用法

python evaluate_cqa_vllm_chatqa2.py --model-folder ${model_path} --eval-dataset ${dataset_name} --start-idx 0 --end-idx ${num_samples} --max-tokens ${max_tokens} --sample-input-file ${dataset_path}

すべての詳細な設定については、all_command.shを参照してください。

🔧 技術詳細

対応者

Peng Xu (pengx@nvidia.com), Wei Ping (wping@nvidia.com)

引用

@article{xu2024chatqa,
  title={ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities},
  author={Xu, Peng and Ping, Wei and Wu, Xianchao and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan},
  journal={arXiv preprint arXiv:2407.14482},
  year={2024}
}