オープンソースphi3-hallucination-judge-mergeモデル - 言語モデル出力の幻覚問題を効果的に検出する

ホーム

Phi3 Hallucination Judge Merge

grounded-aiによって開発

このモデルは、言語モデルの出力における幻覚現象、つまり一貫性はあるが事実誤認や文脈から外れた応答を検出するために使用されます。

大規模言語モデル

Transformers

オープンソースライセンス:MIT #幻覚検出 #二値分類タスク #PEFTファインチューニング

ダウンロード数 63

リリース時間 : 4/25/2025

モデル概要

言語モデルの出力における幻覚を検出するための専用二値分類モデルで、ファインチューニングにより高性能な幻覚検出能力を実現しています。

モデル特徴

高性能幻覚検出

幻覚検出タスクで優れた性能を発揮し、F1スコア0.81を達成、複数の先進言語モデルを上回ります。

軽量アダプター

PEFTアダプタ技術を採用し、ベースモデルを変更することなく効率的なファインチューニングを実現。

標準化プロンプト戦略

標準化された入力フォーマットとプロンプト戦略を提供し、既存システムへの迅速な統合を容易にします。

モデル能力

幻覚検出

テキスト分類

言語モデル出力評価

使用事例

言語モデル品質評価

モデル出力検証

言語モデルが生成する内容の事実正確性を検証

幻覚出力の85%を正確に識別

コンテンツモデレーション

ファクトチェック

生成コンテンツ内の事実誤りを自動検出

エラー検出の再現率87%を達成

🚀 ハロウィン検出用PEFTアダプターモデル

このリポジトリには、ハロウィン評価用のPEFTアダプターモデルが含まれています。このモデルは、言語モデルの出力におけるハロウィン検出に特化しており、高い性能を発揮します。

🚀 クイックスタート

本モデルを使用することで、言語モデルの出力におけるハロウィンを高精度に検出することができます。以下に、モデルの性能と使用方法を紹介します。

✨ 主な機能

ハロウィン検出メトリクス

本モデルは、言語モデルの出力におけるハロウィンを検出する二値分類タスクで、以下の性能を達成しています。

              precision    recall  f1-score   support

           0       0.85      0.71      0.77       100
           1       0.75      0.87      0.81       100

    accuracy                           0.79       200
   macro avg       0.80      0.79      0.79       200
weighted avg       0.80      0.79      0.79       200

他のモデルとの比較

本モデルのハロウィン検出ベンチマークでの性能を、他の最先端の言語モデルと比較しました。

モデル名	精度	再現率	F1スコア
本マージモデル	0.75	0.87	0.81
GPT-4	0.93	0.72	0.82
GPT-4 Turbo	0.97	0.70	0.81
Gemini Pro	0.89	0.53	0.67
GPT-3.5	0.89	0.65	0.75
GPT-3.5-turbo-instruct	0.89	0.80	0.84
Palm 2 (Text Bison)	1.00	0.44	0.61
Claude V2	0.80	0.95	0.87

表に示すように、本マージモデルはF1スコア0.81で、他のいくつかの最先端の言語モデルを上回っています。

📦 インストール

本モデルを使用するには、以下のフレームワークバージョンが必要です。

PEFT 0.11.1
Transformers 4.41.2
Pytorch 2.3.0+cu121
Datasets 2.19.2
Tokenizers 0.19.1

💻 使用例

基本的な使用法

最良の結果を得るために、以下のプロンプト戦略から始めることをおすすめします（必要に応じて調整してください）。

def format_input(reference, query, response):
    prompt = f"""Your job is to evaluate whether a machine learning model has hallucinated or not.
    A hallucination occurs when the response is coherent but factually incorrect or nonsensical
    outputs that are not grounded in the provided context.
    You are given the following information:
    ####INFO####
    [Knowledge]: {reference}
    [User Input]: {query}
    [Model Response]: {response}
    ####END INFO####
    Based on the information provided is the model output a hallucination? Respond with only "yes" or "no"
    """
    return input

text = format_input(query='Based on the follwoing <context>Walrus are the largest mammal</context> answer the question <query> What is the best PC?</query>', response='The best PC is the mac')

messages = [
    {"role": "user", "content": text}
]

pipe = pipeline(
    "text-generation",
    model=base_model,
    model_kwargs={"attn_implementation": attn_implementation, "torch_dtype": torch.float16},
    tokenizer=tokenizer,
)
generation_args = {
    "max_new_tokens": 2,
    "return_full_text": False,
    "temperature": 0.01,
    "do_sample": True,
}

output = pipe(messages, **generation_args)
print(f'Hallucination: {output[0]["generated_text"].strip().lower()}')
# Hallucination: yes

📚 ドキュメント

トレーニングデータ

本モデルのトレーニングには、以下のデータセットが使用されています。

@misc{HaluEval, author = {Junyi Li and Xiaoxue Cheng and Wayne Xin Zhao and Jian-Yun Nie and Ji-Rong Wen }, title = {HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models}, year = {2023}, journal={arXiv preprint arXiv:2305.11747}, url={https://arxiv.org/abs/2305.11747} }