phi3-rag-relevance-judge-merge开源模型 - 精准判断参考文本与问题相关性

首页

Phi3 Rag Relevance Judge Merge

由 grounded-ai 开发

一个用于判断参考文本与问题相关性的二元分类模型，专为RAG系统优化

大型语言模型

Transformers

#RAG相关性分类 #二元文本匹配 #检索增强生成

下载量 21

发布时间 : 5/30/2024

模型简介

该模型通过PEFT适配器技术合并，专注于评估检索增强生成(RAG)系统中参考文本与用户问题的相关性，帮助筛选有用信息

模型特点

优化的提示策略

提供经过优化的输入格式化方法，确保模型准确理解相关性判断任务

PEFT适配器技术

使用参数高效微调技术，在保持基础模型能力的同时实现特定任务优化

平衡的性能表现

在精确率和召回率之间取得良好平衡，避免极端偏向某一指标

模型能力

文本相关性判断

二元分类

RAG系统支持

使用案例

信息检索

RAG系统文档筛选

在检索增强生成系统中预筛选与问题相关的参考文档

提高RAG系统最终答案的质量和相关性

内容审核

问答对匹配验证

验证提供的答案文本是否确实回答了给定问题

帮助构建更高质量的问答数据集

🚀 合并模型性能

本仓库包含了我们合并后的RAG相关性PEFT适配器模型的结果。该模型旨在解决在问答系统中判断参考文本是否包含问题答案的问题，为信息检索和问答任务提供了有效的解决方案。

🚀 快速开始

本仓库主要展示了合并模型的性能、使用方法以及与其他模型的对比情况。你可以通过查看以下各部分内容来了解模型的详细信息。

✨ 主要特性

性能展示：呈现了模型在二分类任务中的性能指标。
使用示例：提供了模型的使用代码示例及提示策略。
对比分析：将合并模型与其他最先进的语言模型在RAG评估基准上进行了性能对比。

📚 详细文档

🔍 RAG相关性分类指标

我们的合并模型在二分类任务中取得了以下性能：

              precision    recall  f1-score   support

           0       0.74      0.77      0.75       100
           1       0.76      0.73      0.74       100

    accuracy                           0.75       200
   macro avg       0.75      0.75      0.75       200
weighted avg       0.75      0.75      0.75       200

💻 使用示例

基础用法

为了获得最佳效果，我们建议从以下提示策略开始（并鼓励根据需要进行调整）：

def format_input_classification(query, text):
    input = f"""
      You are comparing a reference text to a question and trying to determine if the reference text
  contains information relevant to answering the question. Here is the data:
      [BEGIN DATA]
      ************
      [Question]: {query}
      ************
      [Reference text]: {text}
      ************
      [END DATA]
  Compare the Question above to the Reference text. You must determine whether the Reference text
  contains information that can answer the Question. Please focus on whether the very specific
  question can be answered by the information in the Reference text.
  Your response must be single word, either "relevant" or "unrelated",
  and should not contain any text or characters aside from that word.
  "unrelated" means that the reference text does not contain an answer to the Question.
  "relevant" means the reference text contains an answer to the Question."""
    return input


text = format_input_classification("What is quanitzation?",
  "Quantization is a method to reduce the memory footprint")
messages = [
    {"role": "user", "content": text}
]

pipe = pipeline(
    "text-generation",
    model=base_model,
    model_kwargs={"attn_implementation": attn_implementation, "torch_dtype": torch.float16},
    tokenizer=tokenizer,
)

🔍 与其他模型的比较

我们将合并模型在RAG评估基准上的性能与其他几个最先进的语言模型进行了比较：

模型	精确率	召回率	F1值
我们的合并模型	0.74	0.77	0.75
GPT - 4	0.70	0.88	0.78
GPT - 4 Turbo	0.68	0.91	0.78
Gemini Pro	0.61	1.00	0.76
GPT - 3.5	0.42	1.00	0.59
Palm (Text Bison)	0.53	1.00	0.69
[1] 分数来自arize/phoenix