Open-source phi3-hallucination-judge-merge model - Effectively detect hallucination problems in language model outputs

Phi3 Hallucination Judge Merge

Developed by grounded-ai

This model is designed to detect hallucination phenomena in language model outputs, i.e., responses that are coherent but factually incorrect or out of context.

Large Language Model

Transformers

Open Source License:MIT #Hallucination Detection #Binary Classification Task #PEFT Fine-tuning

Downloads 63

Release Time : 4/25/2025

Model Overview

A specialized binary classification model for detecting hallucinations in language model outputs, achieving high-performance hallucination detection through fine-tuning.

Model Features

High-performance Hallucination Detection

Excels in hallucination detection tasks with an F1 score of 0.81, surpassing multiple cutting-edge language models.

Lightweight Adapter

Utilizes PEFT adapter technology for efficient fine-tuning without modifying the base model.

Standardized Prompt Strategy

Provides standardized input formats and prompt strategies for easy integration into existing systems.

Model Capabilities

Hallucination Detection

Text Classification

Language Model Output Evaluation

Use Cases

Language Model Quality Assessment

Model Output Verification

Verify the factual accuracy of language model-generated content

Accurately identifies 85% of hallucinated outputs

Content Moderation

Fact-checking

Automatically detect factual errors in generated content

Achieves 87% recall rate in error detection

🚀 Merged Model for Hallucination Detection

This repository houses our PEFT adapter model for hallucination evaluation. It aims to effectively detect hallucinations in language model outputs, providing reliable performance on relevant binary classification tasks.

🚀 Quick Start

✨ Features

Hallucination Detection: Capable of accurately identifying hallucinations in language model outputs through a binary classification task.
Effective Prompting Strategy: Comes with a recommended prompting strategy to achieve optimal results.
Performance Comparison: Allows for comparison with other state - of - the - art language models on the hallucination detection benchmark.

📦 Installation

The README does not provide installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

For best results, we recommend starting with the following prompting strategy (and encourage tweaks as you see fit):

def format_input(reference, query, response):
    prompt = f"""Your job is to evaluate whether a machine learning model has hallucinated or not.
    A hallucination occurs when the response is coherent but factually incorrect or nonsensical
    outputs that are not grounded in the provided context.
    You are given the following information:
    ####INFO####
    [Knowledge]: {reference}
    [User Input]: {query}
    [Model Response]: {response}
    ####END INFO####
    Based on the information provided is the model output a hallucination? Respond with only "yes" or "no"
    """
    return input

text = format_input(query='Based on the follwoing <context>Walrus are the largest mammal</context> answer the question <query> What is the best PC?</query>', response='The best PC is the mac')

messages = [
    {"role": "user", "content": text}
]

pipe = pipeline(
    "text-generation",
    model=base_model,
    model_kwargs={"attn_implementation": attn_implementation, "torch_dtype": torch.float16},
    tokenizer=tokenizer,
)
generation_args = {
      "max_new_tokens": 2,
      "return_full_text": False,
      "temperature": 0.01,
      "do_sample": True,
  }

output = pipe(messages, **generation_args)
print(f'Hallucination: {output[0]["generated_text"].strip().lower()}')
# Hallucination: yes

📚 Documentation

Hallucination Detection Metrics

Our merged model achieves the following performance on a binary classification task for detecting hallucinations in language model outputs:

              precision    recall  f1-score   support

           0       0.85      0.71      0.77       100
           1       0.75      0.87      0.81       100

    accuracy                           0.79       200
   macro avg       0.80      0.79      0.79       200
weighted avg       0.80      0.79      0.79       200

Comparison with Other Models

We compared our merged model's performance on the hallucination detection benchmark against several other state - of - the - art language models:

Property	Details
Our Merged Model	Precision: 0.75, Recall: 0.87, F1: 0.81
GPT - 4	Precision: 0.93, Recall: 0.72, F1: 0.82
GPT - 4 Turbo	Precision: 0.97, Recall: 0.70, F1: 0.81
Gemini Pro	Precision: 0.89, Recall: 0.53, F1: 0.67
GPT - 3.5	Precision: 0.89, Recall: 0.65, F1: 0.75
GPT - 3.5 - turbo - instruct	Precision: 0.89, Recall: 0.80, F1: 0.84
Palm 2 (Text Bison)	Precision: 1.00, Recall: 0.44, F1: 0.61
Claude V2	Precision: 0.80, Recall: 0.95, F1: 0.87

As shown in the table, our merged model achieves one of the highest F1 scores of 0.81, outperforming several other state - of - the - art language models on this hallucination detection task.

🔧 Technical Details

Training Data

@misc{HaluEval, author = {Junyi Li and Xiaoxue Cheng and Wayne Xin Zhao and Jian - Yun Nie and Ji - Rong Wen }, title = {HaluEval: A Large - Scale Hallucination Evaluation Benchmark for Large Language Models}, year = {2023}, journal={arXiv preprint arXiv:2305.11747}, url={https://arxiv.org/abs/2305.11747} }

Framework versions

PEFT 0.11.1
Transformers 4.41.2
Pytorch 2.3.0+cu121
Datasets 2.19.2
Tokenizers 0.19.1

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 2
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 10
training_steps: 150

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご