T5_11b_trueteacher and ANLI Open-source Models - Free Evaluation of Summary Fact Consistency

T5 11b Trueteacher And Anli

Developed by google

TrueTeacher is a factual consistency evaluation model based on the T5-11B architecture, specifically designed to assess factual consistency in summaries.

Large Language Model

Transformers

English#Factual Summary Evaluation #Large Model Fine-tuning #News Summary Quality Inspection

Downloads 444

Release Time : 8/14/2023

Model Overview

This model is fine-tuned using a mix of TrueTeacher and ANLI datasets for evaluating factual consistency in English summaries, predicting binary labels (1 for consistent, 0 for inconsistent).

Model Features

High-Precision Factual Consistency Evaluation

Achieves an average ROC AUC of 87.8 on the summary subset of the TRUE benchmark, demonstrating excellent performance.

Large-Scale Pretrained Model Fine-tuning

Optimized based on the T5-11B model, combined with TrueTeacher and ANLI datasets.

Long-Text Processing Capability

Supports input lengths of up to 2048 tokens, meeting the input requirements of common summary datasets.

Model Capabilities

Factual Consistency Evaluation

Text Classification

Natural Language Inference

Use Cases

Summary Evaluation

News Summary Fact-Checking

Evaluates whether the summary of a news article is factually consistent with the original text.

Performs excellently on datasets like CNN/DailyMail.

Automatic Summary Quality Assessment

Serves as an evaluation metric for automatic summarization systems.

Can identify factual errors in summaries.

🚀 TrueTeacher

TrueTeacher is a Factual Consistency Evaluation model. It addresses the challenge of evaluating the factual consistency in summarization, providing a reliable solution for research in this field.

🚀 Quick Start

TrueTeacher is a model optimized for evaluating factual consistency in summarization. The input format for the model is: "premise: GROUNDING_DOCUMENT hypothesis: HYPOTHESIS_SUMMARY". It's recommended to set max_length to 2048 to accommodate the input length of common summarization datasets. The model predicts a binary label ('1' - Factually Consistent, '0' - Factually Inconsistent).

✨ Features

Optimized for Summarization: Specifically designed to evaluate factual consistency in summarization tasks.
Based on T5 - 11B: Built upon the powerful T5 - 11B architecture and fine - tuned with multiple high - quality datasets.
Binary Prediction: Provides clear binary labels for factual consistency evaluation.

📦 Installation

The provided README does not contain installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import T5ForConditionalGeneration
from transformers import T5Tokenizer

model_path = 'google/t5_11b_trueteacher_and_anli'
tokenizer = T5Tokenizer.from_pretrained(model_path)
model = T5ForConditionalGeneration.from_pretrained(model_path)

premise = 'the sun is shining'
for hypothesis, expected in [('the sun is out in the sky', '1'), 
                             ('the cat is shiny', '0')]:
  input_ids = tokenizer(
      f'premise: {premise} hypothesis: {hypothesis}',
      return_tensors='pt',
      truncation=True,
      max_length=2048).input_ids
  outputs = model.generate(input_ids)
  result = tokenizer.decode(outputs[0], skip_special_tokens=True)
  print(f'premise: {premise}')
  print(f'hypothesis: {hypothesis}')
  print(f'result: {result} (expected: {expected})\n')

Advanced Usage

from transformers import T5ForConditionalGeneration
from transformers import T5Tokenizer
import torch

model_path = 'google/t5_11b_trueteacher_and_anli'
tokenizer = T5Tokenizer.from_pretrained(model_path)
model = T5ForConditionalGeneration.from_pretrained(model_path)

premise = 'the sun is shining'
for hypothesis, expected in [('the sun is out in the sky', '>> 0.5'), 
                             ('the cat is shiny', '<< 0.5')]:
  input_ids = tokenizer(
      f'premise: {premise} hypothesis: {hypothesis}',
      return_tensors='pt',
      truncation=True,
      max_length=2048).input_ids
  decoder_input_ids = torch.tensor([[tokenizer.pad_token_id]])
  outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
  logits = outputs.logits
  probs = torch.softmax(logits[0], dim=-1)
  one_token_id = tokenizer('1').input_ids[0]
  entailment_prob = probs[0, one_token_id].item()
  print(f'premise: {premise}')
  print(f'hypothesis: {hypothesis}')
  print(f'score: {entailment_prob:.3f} (expected: {expected})\n')

📚 Documentation

Model Details

The model is based on a T5 - 11B (Raffel et al., 2020) and fine - tuned with a mixture of the following datasets:

The TrueTeacher dataset contains model - generated summaries of articles from the train split of the CNN/DailyMail dataset (Hermann et al., 2015) which are annotated for factual consistency using FLAN - PaLM 540B (Chung et al.,2022). Summaries were generated using summarization models which were trained on the XSum dataset (Narayan et al., 2018).

Evaluation Results

This model achieves the following ROC AUC results on the summarization subset of the TRUE benchmark (Honovich et al, 2022):

Property	Details
MNBM	78.1
QAGS - X	89.4
FRANK	93.6
SummEval	88.5
QAGS - C	89.4
Average	87.8

Intended Use

This model is intended for a research use (non - commercial) in English. The recommended use case is evaluating factual consistency in summarization.

Out - of - scope use

Any use cases which violate the cc - by - nc - 4.0 license.
Usage in languages other than English.

🔧 Technical Details

The model is fine - tuned on multiple datasets to enhance its performance in evaluating factual consistency in summarization. The use of T5 - 11B as the base model provides a strong foundation, and the fine - tuning process with datasets like TrueTeacher and ANLI further optimizes its ability to handle real - world summarization tasks. The input format and the binary prediction mechanism are carefully designed to ensure accurate and efficient evaluation.

📄 License

This model is licensed under the cc - by - nc - 4.0 license.

📖 Citation

If you use this model for a research publication, please cite the TrueTeacher paper (using the bibtex entry below), as well as the ANLI, CNN/DailyMail, XSum, T5 and FLAN papers mentioned above.

@misc{gekhman2023trueteacher,
      title={TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models}, 
      author={Zorik Gekhman and Jonathan Herzig and Roee Aharoni and Chen Elkind and Idan Szpektor},
      year={2023},
      eprint={2305.11171},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご