bart-large-cnn開源文本摘要模型 - 免費部署實現英語文本高效摘要

首頁

Bart Large Cnn

由facebook開發

基於英語語料預訓練的BART模型，專門針對CNN每日郵報數據集進行微調，適用於文本摘要任務

文本生成英語開源協議:MIT #英文摘要生成 #ROUGE高精度 #新聞文本壓縮

下載量 3.8M

發布時間 : 3/2/2022

模型概述

該模型採用Transformer編碼器-解碼器架構，通過去噪序列到序列預訓練方法，在文本生成和理解任務中表現優異，當前版本專門優化了新聞摘要能力

模型特點

雙向編碼器結構

結合BERT式雙向編碼器，能充分理解上下文語義

自迴歸解碼器

類似GPT的自迴歸生成能力，保證文本生成流暢性

專業領域微調

在CNN每日郵報新聞數據集上專門優化，摘要效果顯著

模型能力

新聞文本摘要

長文本壓縮

關鍵信息提取

使用案例

新聞媒體

新聞簡報生成

將長篇新聞報道自動壓縮為簡潔摘要

ROUGE-L得分30.6186（CNN每日郵報測試集）

內容提要生成

為在線新聞平臺自動生成文章預覽

生成文本平均長度78.6個詞

信息處理

文檔摘要

對長文檔進行關鍵信息提取

🚀 BART（大型模型），在CNN Daily Mail上微調

BART模型在英文語料上進行了預訓練，並在CNN Daily Mail上進行了微調。該模型由Lewis等人在論文BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension中提出，並首次在此倉庫發佈。

免責聲明：發佈BART的團隊並未為此模型撰寫模型卡片，此模型卡片由Hugging Face團隊撰寫。

🚀 快速開始

本模型可用於文本摘要任務。以下是使用pipeline API調用此模型的示例代碼：

from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""
print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))
>>> [{'summary_text': 'Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.'}]

✨ 主要特性

模型架構：BART是一種Transformer編碼器 - 解碼器（seq2seq）模型，具有雙向（類似BERT）的編碼器和自迴歸（類似GPT）的解碼器。
預訓練方式：通過（1）使用任意噪聲函數破壞文本，（2）學習一個模型來重構原始文本進行預訓練。
應用場景：在微調後，BART在文本生成任務（如摘要、翻譯）中表現出色，同時在理解任務（如文本分類、問答）中也有良好表現。此特定檢查點在CNN Daily Mail（一個大型文本 - 摘要對集合）上進行了微調。

📚 詳細文檔

模型描述

BART是一個Transformer編碼器 - 解碼器（seq2seq）模型，它結合了雙向（類似BERT）的編碼器和自迴歸（類似GPT）的解碼器。BART的預訓練過程包括兩個步驟：首先使用任意噪聲函數破壞文本，然後學習一個模型來重構原始文本。

BART在微調後，在文本生成任務（如摘要、翻譯）中特別有效，同時在理解任務（如文本分類、問答）中也表現良好。此特定檢查點在CNN Daily Mail（一個大型文本 - 摘要對集合）上進行了微調。

預期用途和限制

你可以使用此模型進行文本摘要任務。

BibTeX引用信息

@article{DBLP:journals/corr/abs-1910-13461,
  author    = {Mike Lewis and
               Yinhan Liu and
               Naman Goyal and
               Marjan Ghazvininejad and
               Abdelrahman Mohamed and
               Omer Levy and
               Veselin Stoyanov and
               Luke Zettlemoyer},
  title     = {{BART:} Denoising Sequence-to-Sequence Pre-training for Natural Language
               Generation, Translation, and Comprehension},
  journal   = {CoRR},
  volume    = {abs/1910.13461},
  year      = {2019},
  url       = {http://arxiv.org/abs/1910.13461},
  eprinttype = {arXiv},
  eprint    = {1910.13461},
  timestamp = {Thu, 31 Oct 2019 14:02:26 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1910-13461.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 許可證

本模型使用MIT許可證。

📦 模型信息

屬性	詳情
模型類型	文本摘要模型
訓練數據	CNN Daily Mail
評估指標	ROUGE-1: 42.9486；ROUGE-2: 20.8149；ROUGE-L: 30.6186；ROUGE-LSUM: 40.0376；loss: 2.529000997543335；gen_len: 78.5866