Llama3 ChatQA 2 8B_分類| AIbase模型庫

首頁

Llama3 ChatQA 2 8B

由nvidia開發

基於Llama-3基礎模型開發的128K長上下文大語言模型，專注於提升RAG和長文本理解能力

大型語言模型

PyTorch

英語#128K長文本理解 #檢索增強生成(RAG)#多輪對話優化

下載量 437

發布時間 : 8/28/2024

模型概述

填補開源大語言模型與專有模型在長上下文理解和檢索增強生成(RAG)能力之間的差距，支持128K tokens超長上下文處理

模型特點

128K超長上下文

通過三階段微調流程將上下文窗口從8K擴展到128K tokens

增強RAG能力

針對檢索增強生成場景特別優化，性能接近GPT-4-Turbo級別

多階段訓練方案

採用改進版ChatQA-1.5論文訓練方案開發

雙版本選擇

提供8B和70B兩種參數規模版本滿足不同需求

模型能力

長文本理解

檢索增強生成

指令跟隨

文檔問答

多輪對話

使用案例

金融分析

財報數據分析

從長篇財務報告中提取關鍵指標和趨勢分析

可準確識別淨利潤變化等關鍵財務指標

知識問答

長文檔問答

基於超長技術文檔或研究論文回答專業問題

在32K tokens以上的長文本問答中表現優異

🚀 Llama3-ChatQA-2模型

Llama3-ChatQA-2是一套具備128K長上下文處理能力的模型，它在長上下文理解和檢索增強生成（RAG）能力方面，縮小了開源大語言模型與領先專有模型（如GPT - 4 - Turbo）之間的差距。該模型基於改進的訓練方法開發，為相關領域的應用提供了強大支持。

🚀 快速開始

Llama3-ChatQA-2是一系列128K長上下文模型，它在長上下文理解和檢索增強生成（RAG）能力方面，縮小了開源大語言模型與領先專有模型（如GPT - 4 - Turbo）之間的差距。它基於ChatQA - 1.5論文中改進的訓練方法開發，並構建於Llama - 3基礎模型之上。具體而言，我們對Llama - 3基礎模型進行持續訓練，將上下文窗口從8K擴展到128K標記，並通過三階段指令調優過程，增強了模型的指令遵循、RAG性能和長上下文理解能力。Llama3-ChatQA-2有兩個變體：Llama3-ChatQA-2 - 8B和Llama3-ChatQA-2 - 70B。兩個模型最初都使用Megatron - LM進行訓練，我們將檢查點轉換為Hugging Face格式。有關ChatQA 2的更多信息，請查看網站！

✨ 主要特性

模型詳情

長上下文處理能力：將上下文窗口從8K擴展到128K標記，顯著提升長文本處理能力。
三階段指令調優：增強了模型的指令遵循、RAG性能和長上下文理解能力。
多種變體：提供Llama3-ChatQA-2 - 8B和Llama3-ChatQA-2 - 70B兩個變體，滿足不同需求。

其他資源

基準測試結果概述

我們在短上下文RAG基準測試（ChatRAG，4K標記以內）、來自SCROLLS和LongBench的長上下文任務（32K標記以內）以及來自InfiniteBench的超長上下文任務（100K標記以上）上對ChatQA 2進行了評估。結果如下：

示例圖片

請注意，ChatQA - 2是基於Llama - 3基礎模型構建的。

提示格式

我們強烈建議您使用我們提供的提示格式，如下所示：

當上下文可用時

System: {System}

{Context}

User: {Question}

Assistant: {Response}

User: {Question}

Assistant:

當上下文不可用時

System: {System}

User: {Question}

Assistant: {Response}

User: {Question}

Assistant:

兩種場景下系統回合（即{System}）的內容如下：

這是用戶與人工智能助手之間的對話。助手根據上下文為用戶的問題提供有用、詳細且禮貌的答案。當答案無法在上下文中找到時，助手也應予以說明。

請注意，我們的ChatQA - 2模型針對有上下文的能力進行了優化，例如在文檔或檢索到的上下文上。

📦 安裝指南

文檔未提供具體安裝步驟，暫不展示。

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "nvidia/Llama3-ChatQA-2-8B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

messages = [
    {"role": "user", "content": "what is the percentage change of the net income from Q4 FY23 to Q4 FY24?"}
]

document = """NVIDIA (NASDAQ: NVDA) today reported revenue for the fourth quarter ended January 28, 2024, of $22.1 billion, up 22% from the previous quarter and up 265% from a year ago.\nFor the quarter, GAAP earnings per diluted share was $4.93, up 33% from the previous quarter and up 765% from a year ago. Non-GAAP earnings per diluted share was $5.16, up 28% from the previous quarter and up 486% from a year ago.\nQ4 Fiscal 2024 Summary\nGAAP\n| $ in millions, except earnings per share | Q4 FY24 | Q3 FY24 | Q4 FY23 | Q/Q | Y/Y |\n| Revenue | $22,103 | $18,120 | $6,051 | Up 22% | Up 265% |\n| Gross margin | 76.0% | 74.0% | 63.3% | Up 2.0 pts | Up 12.7 pts |\n| Operating expenses | $3,176 | $2,983 | $2,576 | Up 6% | Up 23% |\n| Operating income | $13,615 | $10,417 | $1,257 | Up 31% | Up 983% |\n| Net income | $12,285 | $9,243 | $1,414 | Up 33% | Up 769% |\n| Diluted earnings per share | $4.93 | $3.71 | $0.57 | Up 33% | Up 765% |"""

def get_formatted_input(messages, context):
    system = "System: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context."
    instruction = "Please give a full and complete answer for the question."

    for item in messages:
        if item['role'] == "user":
            ## only apply this instruction for the first user turn
            item['content'] = instruction + " " + item['content']
            break

    conversation = '\n\n'.join(["User: " + item["content"] if item["role"] == "user" else "Assistant: " + item["content"] for item in messages]) + "\n\nAssistant:"
    formatted_input = system + "\n\n" + context + "\n\n" + conversation
    
    return formatted_input

formatted_input = get_formatted_input(messages, document)
tokenized_prompt = tokenizer(tokenizer.bos_token + formatted_input, return_tensors="pt").to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(input_ids=tokenized_prompt.input_ids, attention_mask=tokenized_prompt.attention_mask, max_new_tokens=128, eos_token_id=terminators)

response = outputs[0][tokenized_prompt.input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

高級用法

python evaluate_cqa_vllm_chatqa2.py --model-folder ${model_path} --eval-dataset ${dataset_name} --start-idx 0 --end-idx ${num_samples} --max-tokens ${max_tokens} --sample-input-file ${dataset_path}

所有詳細配置請參考all_command.sh。

🔧 技術細節

文檔未提供具體技術細節，暫不展示。

📚 詳細文檔

對應人員

彭旭（pengx@nvidia.com），魏平（wping@nvidia.com）

引用

@article{xu2024chatqa,
  title={ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities},
  author={Xu, Peng and Ping, Wei and Wu, Xianchao and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan},
  journal={arXiv preprint arXiv:2407.14482},
  year={2024}
}