T5_11b_trueteacher_and_anli開源模型 - 免費評估摘要事實一致性

首頁

T5 11b Trueteacher And Anli

由google開發

TrueTeacher是一個基於T5-11B架構的事實一致性評估模型，專門用於評估摘要中的事實一致性。

大型語言模型

Transformers

英語#摘要事實性評估 #大模型微調 #新聞摘要質檢

下載量 444

發布時間 : 8/14/2023

模型概述

該模型通過TrueTeacher和ANLI數據集的混合進行微調，用於評估英文摘要的事實一致性，預測二元標籤（1表示一致，0表示不一致）。

模型特點

高精度事實一致性評估

在TRUE基準測試的摘要子集上平均ROC AUC達到87.8，表現優異。

大規模預訓練模型微調

基於T5-11B模型，結合TrueTeacher和ANLI數據集進行優化。

長文本處理能力

支持最大2048 tokens的輸入長度，適應常見摘要數據集的輸入需求。

模型能力

事實一致性評估

文本分類

自然語言推理

使用案例

文本摘要評估

新聞摘要事實檢查

評估新聞文章摘要是否與原文事實一致

在CNN/DailyMail等數據集上表現優異

自動摘要質量評估

作為自動摘要系統的評估指標

可識別摘要中的事實性錯誤

🚀 TrueTeacher

這是一個事實一致性評估模型，在TrueTeacher論文（Gekhman等人，2023）中被提出。該模型旨在解決文本摘要中的事實一致性評估問題，為研究人員提供了一種有效的評估工具。

✨ 主要特性

專為評估摘要中的事實一致性而優化。
基於T5 - 11B模型微調，結合了多個數據集進行訓練。
輸入格式為 "premise: GROUNDING_DOCUMENT hypothesis: HYPOTHESIS_SUMMARY"，並建議設置max_length為2048。
能夠預測二元標籤（'1' - 事實一致，'0' - 事實不一致）。

📦 安裝指南

文檔未提及安裝步驟，故跳過此章節。

💻 使用示例

基礎用法

from transformers import T5ForConditionalGeneration
from transformers import T5Tokenizer

model_path = 'google/t5_11b_trueteacher_and_anli'
tokenizer = T5Tokenizer.from_pretrained(model_path)
model = T5ForConditionalGeneration.from_pretrained(model_path)

premise = 'the sun is shining'
for hypothesis, expected in [('the sun is out in the sky', '1'), 
                             ('the cat is shiny', '0')]:
  input_ids = tokenizer(
      f'premise: {premise} hypothesis: {hypothesis}',
      return_tensors='pt',
      truncation=True,
      max_length=2048).input_ids
  outputs = model.generate(input_ids)
  result = tokenizer.decode(outputs[0], skip_special_tokens=True)
  print(f'premise: {premise}')
  print(f'hypothesis: {hypothesis}')
  print(f'result: {result} (expected: {expected})\n')

高級用法

from transformers import T5ForConditionalGeneration
from transformers import T5Tokenizer
import torch

model_path = 'google/t5_11b_trueteacher_and_anli'
tokenizer = T5Tokenizer.from_pretrained(model_path)
model = T5ForConditionalGeneration.from_pretrained(model_path)

premise = 'the sun is shining'
for hypothesis, expected in [('the sun is out in the sky', '>> 0.5'), 
                             ('the cat is shiny', '<< 0.5')]:
  input_ids = tokenizer(
      f'premise: {premise} hypothesis: {hypothesis}',
      return_tensors='pt',
      truncation=True,
      max_length=2048).input_ids
  decoder_input_ids = torch.tensor([[tokenizer.pad_token_id]])
  outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
  logits = outputs.logits
  probs = torch.softmax(logits[0], dim=-1)
  one_token_id = tokenizer('1').input_ids[0]
  entailment_prob = probs[0, one_token_id].item()
  print(f'premise: {premise}')
  print(f'hypothesis: {hypothesis}')
  print(f'score: {entailment_prob:.3f} (expected: {expected})\n')

📚 詳細文檔

模型詳情

該模型是論文中的主要模型（見表1中的 "T5 - 11B w. ANLI + TrueTeacher full"），基於T5 - 11B (Raffel等人，2020)，並使用以下數據集的混合進行微調：

TrueTeacher數據集包含來自CNN/DailyMail數據集訓練分割的文章的模型生成摘要 (Hermann等人，2015)，這些摘要使用FLAN - PaLM 540B (Chung等人，2022)進行了事實一致性標註。摘要使用在XSum數據集上訓練的摘要模型生成 (Narayan等人，2018)。

評估結果

該模型在TRUE基準測試（Honovich等人，2022）的摘要子集上取得了以下ROC AUC結果：

MNBM	QAGS - X	FRANK	SummEval	QAGS - C	平均值
78.1	89.4	93.6	88.5	89.4	87.8

預期用途

此模型旨在用於英語的研究用途（非商業用途）。推薦的用例是評估摘要中的事實一致性。

超出範圍的使用

任何違反cc - by - nc - 4.0許可證的用例。
使用英語以外的語言。

🔧 技術細節

該模型的輸入格式為 "premise: GROUNDING_DOCUMENT hypothesis: HYPOTHESIS_SUMMARY"。為了適應常見摘要數據集的輸入長度，建議將max_length設置為2048。模型預測一個二元標籤（'1' - 事實一致，'0' - 事實不一致）。

📄 許可證

該模型使用的許可證為cc - by - nc - 4.0。

📚 引用

如果您在研究出版物中使用此模型，請引用TrueTeacher論文（使用下面的bibtex條目），以及上述提到的ANLI、CNN/DailyMail、XSum、T5和FLAN論文。

@misc{gekhman2023trueteacher,
      title={TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models}, 
      author={Zorik Gekhman and Jonathan Herzig and Roee Aharoni and Chen Elkind and Idan Szpektor},
      year={2023},
      eprint={2305.11171},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}