phi3-rag-relevance-judge-merge開源模型 - 精準判斷參考文本與問題相關性

首頁

Phi3 Rag Relevance Judge Merge

由grounded-ai開發

一個用於判斷參考文本與問題相關性的二元分類模型，專為RAG系統優化

大型語言模型

Transformers

#RAG相關性分類 #二元文本匹配 #檢索增強生成

下載量 21

發布時間 : 5/30/2024

模型概述

該模型通過PEFT適配器技術合併，專注於評估檢索增強生成(RAG)系統中參考文本與用戶問題的相關性，幫助篩選有用信息

模型特點

優化的提示策略

提供經過優化的輸入格式化方法，確保模型準確理解相關性判斷任務

PEFT適配器技術

使用參數高效微調技術，在保持基礎模型能力的同時實現特定任務優化

平衡的性能表現

在精確率和召回率之間取得良好平衡，避免極端偏向某一指標

模型能力

文本相關性判斷

二元分類

RAG系統支持

使用案例

信息檢索

RAG系統文檔篩選

在檢索增強生成系統中預篩選與問題相關的參考文檔

提高RAG系統最終答案的質量和相關性

內容審核

問答對匹配驗證

驗證提供的答案文本是否確實回答了給定問題

幫助構建更高質量的問答數據集

🚀 合併模型性能

本倉庫包含了我們合併後的RAG相關性PEFT適配器模型的結果。該模型旨在解決在問答系統中判斷參考文本是否包含問題答案的問題，為信息檢索和問答任務提供了有效的解決方案。

🚀 快速開始

本倉庫主要展示了合併模型的性能、使用方法以及與其他模型的對比情況。你可以通過查看以下各部分內容來了解模型的詳細信息。

✨ 主要特性

性能展示：呈現了模型在二分類任務中的性能指標。
使用示例：提供了模型的使用代碼示例及提示策略。
對比分析：將合併模型與其他最先進的語言模型在RAG評估基準上進行了性能對比。

📚 詳細文檔

🔍 RAG相關性分類指標

我們的合併模型在二分類任務中取得了以下性能：

              precision    recall  f1-score   support

           0       0.74      0.77      0.75       100
           1       0.76      0.73      0.74       100

    accuracy                           0.75       200
   macro avg       0.75      0.75      0.75       200
weighted avg       0.75      0.75      0.75       200

💻 使用示例

基礎用法

為了獲得最佳效果，我們建議從以下提示策略開始（並鼓勵根據需要進行調整）：

def format_input_classification(query, text):
    input = f"""
      You are comparing a reference text to a question and trying to determine if the reference text
  contains information relevant to answering the question. Here is the data:
      [BEGIN DATA]
      ************
      [Question]: {query}
      ************
      [Reference text]: {text}
      ************
      [END DATA]
  Compare the Question above to the Reference text. You must determine whether the Reference text
  contains information that can answer the Question. Please focus on whether the very specific
  question can be answered by the information in the Reference text.
  Your response must be single word, either "relevant" or "unrelated",
  and should not contain any text or characters aside from that word.
  "unrelated" means that the reference text does not contain an answer to the Question.
  "relevant" means the reference text contains an answer to the Question."""
    return input


text = format_input_classification("What is quanitzation?",
  "Quantization is a method to reduce the memory footprint")
messages = [
    {"role": "user", "content": text}
]

pipe = pipeline(
    "text-generation",
    model=base_model,
    model_kwargs={"attn_implementation": attn_implementation, "torch_dtype": torch.float16},
    tokenizer=tokenizer,
)

🔍 與其他模型的比較

我們將合併模型在RAG評估基準上的性能與其他幾個最先進的語言模型進行了比較：

模型	精確率	召回率	F1值
我們的合併模型	0.74	0.77	0.75
GPT - 4	0.70	0.88	0.78
GPT - 4 Turbo	0.68	0.91	0.78
Gemini Pro	0.61	1.00	0.76
GPT - 3.5	0.42	1.00	0.59
Palm (Text Bison)	0.53	1.00	0.69
[1] 分數來自arize/phoenix