Llama3-ChatQA-1.5-70B开源对话问答模型 - 免费部署助力高效问答

首页

Llama3 ChatQA 1.5 70B

由 nvidia 开发

Llama3-ChatQA-1.5是基于Llama-3基础模型开发的对话式问答和检索增强生成模型，在对话式QA和RAG任务中表现卓越。

大型语言模型

Transformers

英语#对话式问答优化 #检索增强生成 #多轮对话处理

下载量 205

发布时间 : 4/28/2024

模型简介

该模型专注于对话式问答和检索增强生成任务，通过增加更多对话式QA数据增强了表格和算术计算能力。

模型特点

优化的对话式问答能力

通过改进的训练方法和增加的对话式QA数据，显著提升了对话式问答的表现

增强的检索增强生成

特别优化了在检索上下文基础上的生成能力，适合文档问答场景

表格和算术计算能力

通过增加相关训练数据，提升了处理表格数据和进行算术计算的能力

模型能力

对话式问答

检索增强生成

表格数据处理

算术计算

多轮对话处理

使用案例

客户服务

智能客服系统

用于构建能够理解复杂客户问题并提供准确回答的客服系统

在多个对话式QA基准测试中表现优于同类模型

文档问答

企业知识库问答

基于企业文档构建智能问答系统，帮助员工快速获取信息

在Doc2Dial等文档问答基准上表现优异

🚀 Llama3-ChatQA-1.5

Llama3-ChatQA-1.5是一款卓越的模型，擅长对话式问答（QA）和检索增强生成（RAG）任务。它基于改进的训练方法开发，能有效处理各类问答场景，为用户提供准确、详细的答案。

🚀 快速开始

Llama3-ChatQA-1.5在对话式问答和检索增强生成方面表现出色。它基于ChatQA论文中的改进训练方法开发，并构建于Llama-3基础模型之上。我们融入了更多对话式问答数据，增强了其表格和算术计算能力。该模型有两个变体：Llama3-ChatQA-1.5-8B和Llama3-ChatQA-1.5-70B。两个模型最初都使用Megatron-LM进行训练，我们已将检查点转换为Hugging Face格式。

如需了解更多关于ChatQA的信息，请查看官网！

✨ 主要特性

出色的问答能力：擅长对话式问答和检索增强生成任务。
改进的训练方法：基于ChatQA论文中的改进训练方法开发。
增强的计算能力：融入更多对话式问答数据，提升了表格和算术计算能力。
双模型变体：提供Llama3-ChatQA-1.5-8B和Llama3-ChatQA-1.5-70B两个变体。

📦 其他资源

📊 基准测试结果

ChatRAG Bench中的结果如下：

	ChatQA-1.0-7B	Command-R-Plus	Llama3-instruct-70b	GPT-4-0613	GPT-4-Turbo	ChatQA-1.0-70B	ChatQA-1.5-8B	ChatQA-1.5-70B
Doc2Dial	37.88	33.51	37.88	34.16	35.35	38.90	39.33	41.26
QuAC	29.69	34.16	36.96	40.29	40.10	41.82	39.73	38.82
QReCC	46.97	49.77	51.34	52.01	51.46	48.05	49.03	51.40
CoQA	76.61	69.71	76.98	77.42	77.73	78.57	76.46	78.44
DoQA	41.57	40.67	41.24	43.39	41.60	51.94	49.60	50.67
ConvFinQA	51.61	71.21	76.6	81.28	84.16	73.69	78.46	81.88
SQA	61.87	74.07	69.61	79.21	79.98	69.14	73.28	83.82
TopioCQA	45.45	53.77	49.72	45.09	48.32	50.98	49.96	55.63
HybriDial*	54.51	46.7	48.59	49.81	47.86	56.44	65.76	68.27
INSCIT	30.96	35.76	36.23	36.34	33.75	31.90	30.10	32.31
平均（全部）	47.71	50.93	52.52	53.90	54.03	54.14	55.17	58.25
平均（不包括HybriDial）	46.96	51.40	52.95	54.35	54.72	53.89	53.99	57.14

请注意，ChatQA-1.5基于Llama-3基础模型构建，而ChatQA-1.0基于Llama-2基础模型构建。ChatQA-1.5模型使用HybriDial训练数据集。为确保公平比较，我们还比较了不包括HybriDial的平均得分。ChatRAG Bench的相关数据和评估脚本可在此处找到。

💻 使用示例

基础用法

将整个文档作为上下文，适用于文档能完整输入模型的场景，无需进行检索。

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "nvidia/Llama3-ChatQA-1.5-70B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

messages = [
    {"role": "user", "content": "what is the percentage change of the net income from Q4 FY23 to Q4 FY24?"}
]

document = """NVIDIA (NASDAQ: NVDA) today reported revenue for the fourth quarter ended January 28, 2024, of $22.1 billion, up 22% from the previous quarter and up 265% from a year ago.\nFor the quarter, GAAP earnings per diluted share was $4.93, up 33% from the previous quarter and up 765% from a year ago. Non-GAAP earnings per diluted share was $5.16, up 28% from the previous quarter and up 486% from a year ago.\nQ4 Fiscal 2024 Summary\nGAAP\n| $ in millions, except earnings per share | Q4 FY24 | Q3 FY24 | Q4 FY23 | Q/Q | Y/Y |\n| Revenue | $22,103 | $18,120 | $6,051 | Up 22% | Up 265% |\n| Gross margin | 76.0% | 74.0% | 63.3% | Up 2.0 pts | Up 12.7 pts |\n| Operating expenses | $3,176 | $2,983 | $2,576 | Up 6% | Up 23% |\n| Operating income | $13,615 | $10,417 | $1,257 | Up 31% | Up 983% |\n| Net income | $12,285 | $9,243 | $1,414 | Up 33% | Up 769% |\n| Diluted earnings per share | $4.93 | $3.71 | $0.57 | Up 33% | Up 765% |"""

def get_formatted_input(messages, context):
    system = "System: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context."
    instruction = "Please give a full and complete answer for the question."

    for item in messages:
        if item['role'] == "user":
            ## only apply this instruction for the first user turn
            item['content'] = instruction + " " + item['content']
            break

    conversation = '\n\n'.join(["User: " + item["content"] if item["role"] == "user" else "Assistant: " + item["content"] for item in messages]) + "\n\nAssistant:"
    formatted_input = system + "\n\n" + context + "\n\n" + conversation
    
    return formatted_input

formatted_input = get_formatted_input(messages, document)
tokenized_prompt = tokenizer(tokenizer.bos_token + formatted_input, return_tensors="pt").to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(input_ids=tokenized_prompt.input_ids, attention_mask=tokenized_prompt.attention_mask, max_new_tokens=128, eos_token_id=terminators)

response = outputs[0][tokenized_prompt.input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

高级用法

运行检索以获取前n个文本块作为上下文，适用于文档较长、需要进行检索的场景。这里我们使用Dragon-multiturn检索器，它可以处理对话式查询。此外，我们还提供了一些文档供用户测试。

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModel
import torch
import json

## load ChatQA-1.5 tokenizer and model
model_id = "nvidia/Llama3-ChatQA-1.5-70B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

## load retriever tokenizer and model
retriever_tokenizer = AutoTokenizer.from_pretrained('nvidia/dragon-multiturn-query-encoder')
query_encoder = AutoModel.from_pretrained('nvidia/dragon-multiturn-query-encoder')
context_encoder = AutoModel.from_pretrained('nvidia/dragon-multiturn-context-encoder')

## prepare documents, we take landrover car manual document that we provide as an example
chunk_list = json.load(open("docs.json"))['landrover']

messages = [
    {"role": "user", "content": "how to connect the bluetooth in the car?"}
]

### running retrieval
## convert query into a format as follows:
## user: {user}\nagent: {agent}\nuser: {user}
formatted_query_for_retriever = '\n'.join([turn['role'] + ": " + turn['content'] for turn in messages]).strip()

query_input = retriever_tokenizer(formatted_query_for_retriever, return_tensors='pt')
ctx_input = retriever_tokenizer(chunk_list, padding=True, truncation=True, max_length=512, return_tensors='pt')
query_emb = query_encoder(**query_input).last_hidden_state[:, 0, :]
ctx_emb = context_encoder(**ctx_input).last_hidden_state[:, 0, :]

## Compute similarity scores using dot product and rank the similarity
similarities = query_emb.matmul(ctx_emb.transpose(0, 1)) # (1, num_ctx)
ranked_results = torch.argsort(similarities, dim=-1, descending=True) # (1, num_ctx)

## get top-n chunks (n=5)
retrieved_chunks = [chunk_list[idx] for idx in ranked_results.tolist()[0][:5]]
context = "\n\n".join(retrieved_chunks)

### running text generation
formatted_input = get_formatted_input(messages, context)
tokenized_prompt = tokenizer(tokenizer.bos_token + formatted_input, return_tensors="pt").to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(input_ids=tokenized_prompt.input_ids, attention_mask=tokenized_prompt.attention_mask, max_new_tokens=128, eos_token_id=terminators)

response = outputs[0][tokenized_prompt.input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

📞 联系方式

Zihan Liu (zihanl@nvidia.com)
Wei Ping (wping@nvidia.com)

📚 引用信息

@article{liu2024chatqa,
  title={ChatQA: Surpassing GPT-4 on Conversational QA and RAG},
  author={Liu, Zihan and Ping, Wei and Roy, Rajarshi and Xu, Peng and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan},
  journal={arXiv preprint arXiv:2401.10225},
  year={2024}}