Llama3-ChatQA-1.5-70B開源對話問答模型 - 免費部署助力高效問答

首頁

Llama3 ChatQA 1.5 70B

由nvidia開發

Llama3-ChatQA-1.5是基於Llama-3基礎模型開發的對話式問答和檢索增強生成模型，在對話式QA和RAG任務中表現卓越。

大型語言模型

Transformers

英語#對話式問答優化 #檢索增強生成 #多輪對話處理

下載量 205

發布時間 : 4/28/2024

模型概述

該模型專注於對話式問答和檢索增強生成任務，通過增加更多對話式QA數據增強了表格和算術計算能力。

模型特點

優化的對話式問答能力

通過改進的訓練方法和增加的對話式QA數據，顯著提升了對話式問答的表現

增強的檢索增強生成

特別優化了在檢索上下文基礎上的生成能力，適合文檔問答場景

表格和算術計算能力

通過增加相關訓練數據，提升了處理表格數據和進行算術計算的能力

模型能力

對話式問答

檢索增強生成

表格數據處理

算術計算

多輪對話處理

使用案例

客戶服務

智能客服系統

用於構建能夠理解複雜客戶問題並提供準確回答的客服系統

在多個對話式QA基準測試中表現優於同類模型

文檔問答

企業知識庫問答

基於企業文檔構建智能問答系統，幫助員工快速獲取信息

在Doc2Dial等文檔問答基準上表現優異

🚀 Llama3-ChatQA-1.5

Llama3-ChatQA-1.5是一款卓越的模型，擅長對話式問答（QA）和檢索增強生成（RAG）任務。它基於改進的訓練方法開發，能有效處理各類問答場景，為用戶提供準確、詳細的答案。

🚀 快速開始

Llama3-ChatQA-1.5在對話式問答和檢索增強生成方面表現出色。它基於ChatQA論文中的改進訓練方法開發，並構建於Llama-3基礎模型之上。我們融入了更多對話式問答數據，增強了其表格和算術計算能力。該模型有兩個變體：Llama3-ChatQA-1.5-8B和Llama3-ChatQA-1.5-70B。兩個模型最初都使用Megatron-LM進行訓練，我們已將檢查點轉換為Hugging Face格式。

如需瞭解更多關於ChatQA的信息，請查看官網！

✨ 主要特性

出色的問答能力：擅長對話式問答和檢索增強生成任務。
改進的訓練方法：基於ChatQA論文中的改進訓練方法開發。
增強的計算能力：融入更多對話式問答數據，提升了表格和算術計算能力。
雙模型變體：提供Llama3-ChatQA-1.5-8B和Llama3-ChatQA-1.5-70B兩個變體。

📦 其他資源

📊 基準測試結果

ChatRAG Bench中的結果如下：

	ChatQA-1.0-7B	Command-R-Plus	Llama3-instruct-70b	GPT-4-0613	GPT-4-Turbo	ChatQA-1.0-70B	ChatQA-1.5-8B	ChatQA-1.5-70B
Doc2Dial	37.88	33.51	37.88	34.16	35.35	38.90	39.33	41.26
QuAC	29.69	34.16	36.96	40.29	40.10	41.82	39.73	38.82
QReCC	46.97	49.77	51.34	52.01	51.46	48.05	49.03	51.40
CoQA	76.61	69.71	76.98	77.42	77.73	78.57	76.46	78.44
DoQA	41.57	40.67	41.24	43.39	41.60	51.94	49.60	50.67
ConvFinQA	51.61	71.21	76.6	81.28	84.16	73.69	78.46	81.88
SQA	61.87	74.07	69.61	79.21	79.98	69.14	73.28	83.82
TopioCQA	45.45	53.77	49.72	45.09	48.32	50.98	49.96	55.63
HybriDial*	54.51	46.7	48.59	49.81	47.86	56.44	65.76	68.27
INSCIT	30.96	35.76	36.23	36.34	33.75	31.90	30.10	32.31
平均（全部）	47.71	50.93	52.52	53.90	54.03	54.14	55.17	58.25
平均（不包括HybriDial）	46.96	51.40	52.95	54.35	54.72	53.89	53.99	57.14

請注意，ChatQA-1.5基於Llama-3基礎模型構建，而ChatQA-1.0基於Llama-2基礎模型構建。ChatQA-1.5模型使用HybriDial訓練數據集。為確保公平比較，我們還比較了不包括HybriDial的平均得分。ChatRAG Bench的相關數據和評估腳本可在此處找到。

💻 使用示例

基礎用法

將整個文檔作為上下文，適用於文檔能完整輸入模型的場景，無需進行檢索。

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "nvidia/Llama3-ChatQA-1.5-70B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

messages = [
    {"role": "user", "content": "what is the percentage change of the net income from Q4 FY23 to Q4 FY24?"}
]

document = """NVIDIA (NASDAQ: NVDA) today reported revenue for the fourth quarter ended January 28, 2024, of $22.1 billion, up 22% from the previous quarter and up 265% from a year ago.\nFor the quarter, GAAP earnings per diluted share was $4.93, up 33% from the previous quarter and up 765% from a year ago. Non-GAAP earnings per diluted share was $5.16, up 28% from the previous quarter and up 486% from a year ago.\nQ4 Fiscal 2024 Summary\nGAAP\n| $ in millions, except earnings per share | Q4 FY24 | Q3 FY24 | Q4 FY23 | Q/Q | Y/Y |\n| Revenue | $22,103 | $18,120 | $6,051 | Up 22% | Up 265% |\n| Gross margin | 76.0% | 74.0% | 63.3% | Up 2.0 pts | Up 12.7 pts |\n| Operating expenses | $3,176 | $2,983 | $2,576 | Up 6% | Up 23% |\n| Operating income | $13,615 | $10,417 | $1,257 | Up 31% | Up 983% |\n| Net income | $12,285 | $9,243 | $1,414 | Up 33% | Up 769% |\n| Diluted earnings per share | $4.93 | $3.71 | $0.57 | Up 33% | Up 765% |"""

def get_formatted_input(messages, context):
    system = "System: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context."
    instruction = "Please give a full and complete answer for the question."

    for item in messages:
        if item['role'] == "user":
            ## only apply this instruction for the first user turn
            item['content'] = instruction + " " + item['content']
            break

    conversation = '\n\n'.join(["User: " + item["content"] if item["role"] == "user" else "Assistant: " + item["content"] for item in messages]) + "\n\nAssistant:"
    formatted_input = system + "\n\n" + context + "\n\n" + conversation
    
    return formatted_input

formatted_input = get_formatted_input(messages, document)
tokenized_prompt = tokenizer(tokenizer.bos_token + formatted_input, return_tensors="pt").to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(input_ids=tokenized_prompt.input_ids, attention_mask=tokenized_prompt.attention_mask, max_new_tokens=128, eos_token_id=terminators)

response = outputs[0][tokenized_prompt.input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

高級用法

運行檢索以獲取前n個文本塊作為上下文，適用於文檔較長、需要進行檢索的場景。這裡我們使用Dragon-multiturn檢索器，它可以處理對話式查詢。此外，我們還提供了一些文檔供用戶測試。

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModel
import torch
import json

## load ChatQA-1.5 tokenizer and model
model_id = "nvidia/Llama3-ChatQA-1.5-70B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

## load retriever tokenizer and model
retriever_tokenizer = AutoTokenizer.from_pretrained('nvidia/dragon-multiturn-query-encoder')
query_encoder = AutoModel.from_pretrained('nvidia/dragon-multiturn-query-encoder')
context_encoder = AutoModel.from_pretrained('nvidia/dragon-multiturn-context-encoder')

## prepare documents, we take landrover car manual document that we provide as an example
chunk_list = json.load(open("docs.json"))['landrover']

messages = [
    {"role": "user", "content": "how to connect the bluetooth in the car?"}
]

### running retrieval
## convert query into a format as follows:
## user: {user}\nagent: {agent}\nuser: {user}
formatted_query_for_retriever = '\n'.join([turn['role'] + ": " + turn['content'] for turn in messages]).strip()

query_input = retriever_tokenizer(formatted_query_for_retriever, return_tensors='pt')
ctx_input = retriever_tokenizer(chunk_list, padding=True, truncation=True, max_length=512, return_tensors='pt')
query_emb = query_encoder(**query_input).last_hidden_state[:, 0, :]
ctx_emb = context_encoder(**ctx_input).last_hidden_state[:, 0, :]

## Compute similarity scores using dot product and rank the similarity
similarities = query_emb.matmul(ctx_emb.transpose(0, 1)) # (1, num_ctx)
ranked_results = torch.argsort(similarities, dim=-1, descending=True) # (1, num_ctx)

## get top-n chunks (n=5)
retrieved_chunks = [chunk_list[idx] for idx in ranked_results.tolist()[0][:5]]
context = "\n\n".join(retrieved_chunks)

### running text generation
formatted_input = get_formatted_input(messages, context)
tokenized_prompt = tokenizer(tokenizer.bos_token + formatted_input, return_tensors="pt").to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(input_ids=tokenized_prompt.input_ids, attention_mask=tokenized_prompt.attention_mask, max_new_tokens=128, eos_token_id=terminators)

response = outputs[0][tokenized_prompt.input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

📞 聯繫方式

Zihan Liu (zihanl@nvidia.com)
Wei Ping (wping@nvidia.com)

📚 引用信息

@article{liu2024chatqa,
  title={ChatQA: Surpassing GPT-4 on Conversational QA and RAG},
  author={Liu, Zihan and Ping, Wei and Roy, Rajarshi and Xu, Peng and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan},
  journal={arXiv preprint arXiv:2401.10225},
  year={2024}}