模型简介
模型特点
模型能力
使用案例
🚀 Llama3-ChatQA-2模型
Llama3-ChatQA-2是一套具备128K长上下文处理能力的模型,它在长上下文理解和检索增强生成(RAG)能力方面,缩小了开源大语言模型与领先专有模型(如GPT - 4 - Turbo)之间的差距。该模型基于改进的训练方法开发,为相关领域的应用提供了强大支持。
🚀 快速开始
Llama3-ChatQA-2是一系列128K长上下文模型,它在长上下文理解和检索增强生成(RAG)能力方面,缩小了开源大语言模型与领先专有模型(如GPT - 4 - Turbo)之间的差距。它基于ChatQA - 1.5论文中改进的训练方法开发,并构建于Llama - 3基础模型之上。具体而言,我们对Llama - 3基础模型进行持续训练,将上下文窗口从8K扩展到128K标记,并通过三阶段指令调优过程,增强了模型的指令遵循、RAG性能和长上下文理解能力。Llama3-ChatQA-2有两个变体:Llama3-ChatQA-2 - 8B和Llama3-ChatQA-2 - 70B。两个模型最初都使用Megatron - LM进行训练,我们将检查点转换为Hugging Face格式。有关ChatQA 2的更多信息,请查看网站!
✨ 主要特性
模型详情
- 长上下文处理能力:将上下文窗口从8K扩展到128K标记,显著提升长文本处理能力。
- 三阶段指令调优:增强了模型的指令遵循、RAG性能和长上下文理解能力。
- 多种变体:提供Llama3-ChatQA-2 - 8B和Llama3-ChatQA-2 - 70B两个变体,满足不同需求。
其他资源
基准测试结果概述
我们在短上下文RAG基准测试(ChatRAG,4K标记以内)、来自SCROLLS和LongBench的长上下文任务(32K标记以内)以及来自InfiniteBench的超长上下文任务(100K标记以上)上对ChatQA 2进行了评估。结果如下:
请注意,ChatQA - 2是基于Llama - 3基础模型构建的。
提示格式
我们强烈建议您使用我们提供的提示格式,如下所示:
当上下文可用时
System: {System}
{Context}
User: {Question}
Assistant: {Response}
User: {Question}
Assistant:
当上下文不可用时
System: {System}
User: {Question}
Assistant: {Response}
User: {Question}
Assistant:
两种场景下系统回合(即{System})的内容如下:
这是用户与人工智能助手之间的对话。助手根据上下文为用户的问题提供有用、详细且礼貌的答案。当答案无法在上下文中找到时,助手也应予以说明。
请注意,我们的ChatQA - 2模型针对有上下文的能力进行了优化,例如在文档或检索到的上下文上。
📦 安装指南
文档未提供具体安装步骤,暂不展示。
💻 使用示例
基础用法
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "nvidia/Llama3-ChatQA-2-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
messages = [
{"role": "user", "content": "what is the percentage change of the net income from Q4 FY23 to Q4 FY24?"}
]
document = """NVIDIA (NASDAQ: NVDA) today reported revenue for the fourth quarter ended January 28, 2024, of $22.1 billion, up 22% from the previous quarter and up 265% from a year ago.\nFor the quarter, GAAP earnings per diluted share was $4.93, up 33% from the previous quarter and up 765% from a year ago. Non-GAAP earnings per diluted share was $5.16, up 28% from the previous quarter and up 486% from a year ago.\nQ4 Fiscal 2024 Summary\nGAAP\n| $ in millions, except earnings per share | Q4 FY24 | Q3 FY24 | Q4 FY23 | Q/Q | Y/Y |\n| Revenue | $22,103 | $18,120 | $6,051 | Up 22% | Up 265% |\n| Gross margin | 76.0% | 74.0% | 63.3% | Up 2.0 pts | Up 12.7 pts |\n| Operating expenses | $3,176 | $2,983 | $2,576 | Up 6% | Up 23% |\n| Operating income | $13,615 | $10,417 | $1,257 | Up 31% | Up 983% |\n| Net income | $12,285 | $9,243 | $1,414 | Up 33% | Up 769% |\n| Diluted earnings per share | $4.93 | $3.71 | $0.57 | Up 33% | Up 765% |"""
def get_formatted_input(messages, context):
system = "System: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context."
instruction = "Please give a full and complete answer for the question."
for item in messages:
if item['role'] == "user":
## only apply this instruction for the first user turn
item['content'] = instruction + " " + item['content']
break
conversation = '\n\n'.join(["User: " + item["content"] if item["role"] == "user" else "Assistant: " + item["content"] for item in messages]) + "\n\nAssistant:"
formatted_input = system + "\n\n" + context + "\n\n" + conversation
return formatted_input
formatted_input = get_formatted_input(messages, document)
tokenized_prompt = tokenizer(tokenizer.bos_token + formatted_input, return_tensors="pt").to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(input_ids=tokenized_prompt.input_ids, attention_mask=tokenized_prompt.attention_mask, max_new_tokens=128, eos_token_id=terminators)
response = outputs[0][tokenized_prompt.input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
高级用法
python evaluate_cqa_vllm_chatqa2.py --model-folder ${model_path} --eval-dataset ${dataset_name} --start-idx 0 --end-idx ${num_samples} --max-tokens ${max_tokens} --sample-input-file ${dataset_path}
所有详细配置请参考all_command.sh
。
🔧 技术细节
文档未提供具体技术细节,暂不展示。
📚 详细文档
对应人员
彭旭(pengx@nvidia.com),魏平(wping@nvidia.com)
引用
@article{xu2024chatqa,
title={ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities},
author={Xu, Peng and Ping, Wei and Wu, Xianchao and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan},
journal={arXiv preprint arXiv:2407.14482},
year={2024}
}
📄 许可证
该模型根据非商业许可证发布,此模型的使用也受META LLAMA 3社区许可协议的约束。



