MiniCheck-Flan-T5-Large Open-Source Fact-Checking Model - Accurately Predict Whether a Document Supports a Statement

Minicheck Flan T5 Large

Developed by lytang

MiniCheck-Flan-T5-Large is a fact-checking model based on the Flan-T5-Large architecture, used to predict whether a document supports a given claim.

Text Classification

Transformers

EnglishOpen Source License:MIT #Efficient Fact-Checking #LLM-Generated Content Verification #Document Support Judgment

Downloads 1,410

Release Time : 4/14/2024

Model Overview

This model is used to determine whether a given document supports a specific claim, outputting binary labels (1 for support, 0 for no support). It is one of the best fact-checking models with fewer than 1 billion parameters, performing on par with GPT-4.

Model Features

Efficient Fact-Checking

Fewer than 1 billion parameters but performance comparable to GPT-4, with 400x lower cost

High-Quality Training Data

Trained on 35K combined data, including 21K ANLI data and 14K newly generated synthetic data

Sentence-Level Prediction

Can determine at the sentence level whether a document supports a claim

Model Capabilities

Fact-Checking

Text Classification

Document-Claim Matching

Use Cases

Content Moderation

News Fact-Checking

Verify whether claims in news reports are supported by original documents

High accuracy in identifying false or unverified claims

Academic Research

Citation Verification

Check whether citations in academic papers accurately reflect the content of cited sources

Effectively identifies misquotes or out-of-context citations

🚀 MiniCheck-Flan-T5-Large

This is a fact-checking model designed to determine whether a given sentence is supported by a provided document. It offers high performance at a relatively low cost, making it an efficient solution for fact-checking tasks.

✨ Features

Binary Prediction: The model predicts a binary label - 1 for supported and 0 for unsupported on the sentence-level.
High Performance: MiniCheck-Flan-T5-Large outperforms all existing specialized fact-checkers with a similar scale by a large margin (4 - 10% absolute increase) and is on par with GPT-4, but 400x cheaper.
Fine-tuned Model: It is fine-tuned from google/flan-t5-large on a combination of 35K data, including 21K ANLI data and 14K synthetic data.

📦 Installation

Please run the following command to install the MiniCheck package and all necessary dependencies.

pip install "minicheck @ git+https://github.com/Liyan06/MiniCheck.git@main"

💻 Usage Examples

Basic Usage

from minicheck.minicheck import MiniCheck
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

doc = "A group of students gather in the school library to study for their upcoming final exams."
claim_1 = "The students are preparing for an examination."
claim_2 = "The students are on vacation."

# model_name can be one of ['roberta-large', 'deberta-v3-large', 'flan-t5-large', 'Bespoke-MiniCheck-7B']
scorer = MiniCheck(model_name='flan-t5-large', cache_dir='./ckpts')
pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2])

print(pred_label) # [1, 0]
print(raw_prob)   # [0.9805923700332642, 0.007121307775378227]

Advanced Usage

Test on our LLM-AggreFact Benchmark

import pandas as pd
from datasets import load_dataset
from minicheck.minicheck import MiniCheck
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# load 29K test data
df = pd.DataFrame(load_dataset("lytang/LLM-AggreFact")['test'])
docs = df.doc.values
claims = df.claim.values

scorer = MiniCheck(model_name='flan-t5-large', cache_dir='./ckpts')
pred_label, raw_prob, _, _ = scorer.score(docs=docs, claims=claims)  # ~ 500 docs/min, depending on hardware

Evaluate the result on the benchmark

from sklearn.metrics import balanced_accuracy_score

df['preds'] = pred_label
result_df = pd.DataFrame(columns=['Dataset', 'BAcc'])
for dataset in df.dataset.unique():
    sub_df = df[df.dataset == dataset]
    bacc = balanced_accuracy_score(sub_df.label, sub_df.preds) * 100
    result_df.loc[len(result_df)] = [dataset, bacc]

result_df.loc[len(result_df)] = ['Average', result_df.BAcc.mean()]
result_df.round(1)

📚 Documentation

This is a fact-checking model from our work:

📃 MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents,(EMNLP 2024, GitHub Repo)

The model is based on Flan-T5-Large that predicts a binary label - 1 for supported and 0 for unsupported. The model is doing predictions on the sentence-level. It takes as input a document and a sentence and determine whether the sentence is supported by the document: MiniCheck-Model(document, claim) -> {0, 1}

MiniCheck-Flan-T5-Large is the best fact-checking model with size < 1B and reaches GPT-4 performance. It is fine tuned from google/flan-t5-large (Chung et al., 2022) on the combination of 35K data:

21K ANLI data (Nie et al., 2020)
14K synthetic data generated from scratch in a structed way (more details in the paper).

Model Variants

We also have other three MiniCheck model variants:

bespokelabs/Bespoke-Minicheck-7B (Model Size: 7B)
lytang/MiniCheck-RoBERTa-Large (Model Size: 0.4B)
lytang/MiniCheck-DeBERTa-v3-Large (Model Size: 0.4B)

Model Performance

The performance of these models is evaluated on our new collected benchmark (unseen by our models during training), LLM-AggreFact, from 11 recent human annotated datasets on fact-checking and grounding LLM generations. MiniCheck-Flan-T5-Large outperform all existing specialized fact-checkers with a similar scale by a large margin (4 - 10% absolute increase) and is on par with GPT-4, but 400x cheaper. See full results in our work.

Note: We only evaluated the performance of our models on real claims -- without any human intervention in any format, such as injecting certain error types into model-generated claims. Those edited claims do not reflect LLMs' actual behaviors.

📄 License

This project is licensed under the MIT License.

📖 Citation

@InProceedings{tang-etal-2024-minicheck,
  title = {MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents},
  author = {Liyan Tang and Philippe Laban and Greg Durrett},
  booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
  year = {2024},
  publisher = {Association for Computational Linguistics},
  url = {https://arxiv.org/pdf/2404.10774}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご