t5-base-summarization-claim-extractor開源模型 - 從摘要文本精準提取原子聲明

首頁

T5 Base Summarization Claim Extractor

由Babelscape開發

基於T5架構的模型，專門用於從摘要文本中提取原子聲明，是摘要事實性評估流程的關鍵組件。

文本生成

Transformers

英語#摘要聲明提取 #原子聲明識別 #事實性評估組件

下載量 666.36k

發布時間 : 6/27/2024

模型概述

該模型通過微調T5架構實現，專注於從摘要中提取可驗證的原子聲明，支持摘要事實性評估任務。

模型特點

原子聲明提取

能夠從複雜摘要中精確識別和提取獨立的可驗證聲明

事實性評估支持

作為FENICE框架的核心組件，為摘要事實性評估提供基礎支持

高性能表現

在ROSE數據集上達到與GPT-3.5相當的F1分數（73.4）

模型能力

文本理解

關鍵信息提取

結構化輸出生成

使用案例

新聞摘要分析

科技新聞事實核查

從科技新聞摘要中提取關鍵聲明，支持後續事實核查

可準確提取技術規格、性能聲明等關鍵信息

學術研究支持

論文摘要分析

提取學術論文摘要中的核心研究聲明

幫助研究人員快速識別論文關鍵貢獻

🚀 模型卡片：T5-base-summarization-claim-extractor

本模型主要用於從摘要中提取原子聲明，在摘要事實性評估等相關任務中具有重要價值。

🚀 快速開始

示例代碼

from transformers import T5ForConditionalGeneration, T5Tokenizer

tokenizer = T5Tokenizer.from_pretrained("Babelscape/t5-base-summarization-claim-extractor")
model = T5ForConditionalGeneration.from_pretrained("Babelscape/t5-base-summarization-claim-extractor")
summary = 'Simone Biles made a triumphant return to the Olympic stage at the Paris 2024 Games, competing in the women’s gymnastics qualifications. Overcoming a previous struggle with the “twisties” that led to her withdrawal from events at the Tokyo 2020 Olympics, Biles dazzled with strong performances on all apparatus, helping the U.S. team secure a commanding lead in the qualifications. Her routines showcased her resilience and skill, drawing enthusiastic support from a star-studded audience'

tok_input = tokenizer.batch_encode_plus([summary], return_tensors="pt", padding=True)
claims = model.generate(**tok_input)
claims = tokenizer.batch_decode(claims, skip_special_tokens=True)

注意：模型以單個字符串形式輸出聲明。請記得將字符串拆分為句子，以提取單個聲明。

✨ 主要特性

模型描述

模型名稱：T5-base-summarization-claim-extractor
作者：Alessandro Scirè、Karim Ghonim 和 Roberto Navigli
聯繫方式：scire@diag.uniroma1.it, scire@babelscape.com
語言：英語
主要用途：從摘要中提取原子聲明

概述

T5-base-summarization-claim-extractor 是為從摘要中提取原子聲明而開發的模型。該模型基於 T5 架構，並針對聲明提取任務進行了微調。

此模型是論文 "FENICE: Factuality Evaluation of summarization based on Natural Language Inference and Claim Extraction" by Alessandro Scirè, Karim Ghonim, and Roberto Navigli. 中研究的一部分。FENICE 利用自然語言推理（NLI）和聲明提取來評估摘要的事實性。ArXiv 版本。

預期用途

從摘要中提取原子聲明。
作為摘要事實性評估管道的一個組件。

訓練

有關訓練過程的詳細信息，請查看論文(https://aclanthology.org/2024.findings-acl.841.pdf)（第 4.1 節）。

性能

屬性	詳情
模型類型	T5-base-summarization-claim-extractor
訓練數據	詳情見論文(https://aclanthology.org/2024.findings-acl.841.pdf) 第 4.1 節

模型	easiness_P	easiness_R	easiness_F1
GPT-3.5	80.1	70.9	74.9
t5-base-summarization-claim-extractor	79.2	68.8	73.4

表 1：基於大語言模型（LLM）的聲明提取器，即 GPT-3.5 和 t5-base-summarization-claim-extractor，在 ROSE (Liu et al., 2023b) 上評估的簡易精度（easiness_P）、召回率（easiness_R）和 F1 分數（easiness_F1）結果。

有關模型性能和所使用指標的更多詳細信息，請參閱論文（第 4.1 節）。

主要倉庫

有關 FENICE 的更多詳細信息，請查看 GitHub 倉庫：Babelscape/FENICE

引用

如果您在工作中使用此模型，請引用以下論文：


@inproceedings{scire-etal-2024-fenice,
    title = "{FENICE}: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction",
    author = "Scir{\`e}, Alessandro and Ghonim, Karim and Navigli, Roberto",
    editor = "Ku, Lun-Wei  and Martins, Andre and Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.841",
    pages = "14148--14161",
}