模型概述
模型特點
模型能力
使用案例
🚀 Llama3-ChatQA-2模型
Llama3-ChatQA-2是一套具備128K長上下文處理能力的模型,它在長上下文理解和檢索增強生成(RAG)能力方面,縮小了開源大語言模型與領先專有模型(如GPT - 4 - Turbo)之間的差距。該模型基於改進的訓練方法開發,為相關領域的應用提供了強大支持。
🚀 快速開始
Llama3-ChatQA-2是一系列128K長上下文模型,它在長上下文理解和檢索增強生成(RAG)能力方面,縮小了開源大語言模型與領先專有模型(如GPT - 4 - Turbo)之間的差距。它基於ChatQA - 1.5論文中改進的訓練方法開發,並構建於Llama - 3基礎模型之上。具體而言,我們對Llama - 3基礎模型進行持續訓練,將上下文窗口從8K擴展到128K標記,並通過三階段指令調優過程,增強了模型的指令遵循、RAG性能和長上下文理解能力。Llama3-ChatQA-2有兩個變體:Llama3-ChatQA-2 - 8B和Llama3-ChatQA-2 - 70B。兩個模型最初都使用Megatron - LM進行訓練,我們將檢查點轉換為Hugging Face格式。有關ChatQA 2的更多信息,請查看網站!
✨ 主要特性
模型詳情
- 長上下文處理能力:將上下文窗口從8K擴展到128K標記,顯著提升長文本處理能力。
- 三階段指令調優:增強了模型的指令遵循、RAG性能和長上下文理解能力。
- 多種變體:提供Llama3-ChatQA-2 - 8B和Llama3-ChatQA-2 - 70B兩個變體,滿足不同需求。
其他資源
基準測試結果概述
我們在短上下文RAG基準測試(ChatRAG,4K標記以內)、來自SCROLLS和LongBench的長上下文任務(32K標記以內)以及來自InfiniteBench的超長上下文任務(100K標記以上)上對ChatQA 2進行了評估。結果如下:
請注意,ChatQA - 2是基於Llama - 3基礎模型構建的。
提示格式
我們強烈建議您使用我們提供的提示格式,如下所示:
當上下文可用時
System: {System}
{Context}
User: {Question}
Assistant: {Response}
User: {Question}
Assistant:
當上下文不可用時
System: {System}
User: {Question}
Assistant: {Response}
User: {Question}
Assistant:
兩種場景下系統回合(即{System})的內容如下:
這是用戶與人工智能助手之間的對話。助手根據上下文為用戶的問題提供有用、詳細且禮貌的答案。當答案無法在上下文中找到時,助手也應予以說明。
請注意,我們的ChatQA - 2模型針對有上下文的能力進行了優化,例如在文檔或檢索到的上下文上。
📦 安裝指南
文檔未提供具體安裝步驟,暫不展示。
💻 使用示例
基礎用法
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "nvidia/Llama3-ChatQA-2-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
messages = [
{"role": "user", "content": "what is the percentage change of the net income from Q4 FY23 to Q4 FY24?"}
]
document = """NVIDIA (NASDAQ: NVDA) today reported revenue for the fourth quarter ended January 28, 2024, of $22.1 billion, up 22% from the previous quarter and up 265% from a year ago.\nFor the quarter, GAAP earnings per diluted share was $4.93, up 33% from the previous quarter and up 765% from a year ago. Non-GAAP earnings per diluted share was $5.16, up 28% from the previous quarter and up 486% from a year ago.\nQ4 Fiscal 2024 Summary\nGAAP\n| $ in millions, except earnings per share | Q4 FY24 | Q3 FY24 | Q4 FY23 | Q/Q | Y/Y |\n| Revenue | $22,103 | $18,120 | $6,051 | Up 22% | Up 265% |\n| Gross margin | 76.0% | 74.0% | 63.3% | Up 2.0 pts | Up 12.7 pts |\n| Operating expenses | $3,176 | $2,983 | $2,576 | Up 6% | Up 23% |\n| Operating income | $13,615 | $10,417 | $1,257 | Up 31% | Up 983% |\n| Net income | $12,285 | $9,243 | $1,414 | Up 33% | Up 769% |\n| Diluted earnings per share | $4.93 | $3.71 | $0.57 | Up 33% | Up 765% |"""
def get_formatted_input(messages, context):
system = "System: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context."
instruction = "Please give a full and complete answer for the question."
for item in messages:
if item['role'] == "user":
## only apply this instruction for the first user turn
item['content'] = instruction + " " + item['content']
break
conversation = '\n\n'.join(["User: " + item["content"] if item["role"] == "user" else "Assistant: " + item["content"] for item in messages]) + "\n\nAssistant:"
formatted_input = system + "\n\n" + context + "\n\n" + conversation
return formatted_input
formatted_input = get_formatted_input(messages, document)
tokenized_prompt = tokenizer(tokenizer.bos_token + formatted_input, return_tensors="pt").to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(input_ids=tokenized_prompt.input_ids, attention_mask=tokenized_prompt.attention_mask, max_new_tokens=128, eos_token_id=terminators)
response = outputs[0][tokenized_prompt.input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
高級用法
python evaluate_cqa_vllm_chatqa2.py --model-folder ${model_path} --eval-dataset ${dataset_name} --start-idx 0 --end-idx ${num_samples} --max-tokens ${max_tokens} --sample-input-file ${dataset_path}
所有詳細配置請參考all_command.sh
。
🔧 技術細節
文檔未提供具體技術細節,暫不展示。
📚 詳細文檔
對應人員
彭旭(pengx@nvidia.com),魏平(wping@nvidia.com)
引用
@article{xu2024chatqa,
title={ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities},
author={Xu, Peng and Ping, Wei and Wu, Xianchao and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan},
journal={arXiv preprint arXiv:2407.14482},
year={2024}
}
📄 許可證
該模型根據非商業許可證發佈,此模型的使用也受META LLAMA 3社區許可協議的約束。



