bart-large-summary-map-reduce開源文本摘要模型 - 免費合併長文檔分塊摘要成最終摘要

首頁

Bart Large Summary Map Reduce

由pszemraj開發

基於BART-large的文本摘要模型，專門用於將長文檔的分塊摘要合併為最終摘要

文本生成

Transformers

英語開源協議:Apache-2.0 #長文檔摘要聚合 #分塊摘要合併 #文本壓縮優化

下載量 43

發布時間 : 11/5/2024

模型概述

該模型是一個文本到文本生成模型，主要用於處理長文檔摘要任務中的'映射歸約'步驟，能夠將分塊生成的多段摘要合併為一個連貫的最終摘要。

模型特點

長文檔摘要處理

專門設計用於處理長文檔摘要任務，能夠有效整合分塊生成的摘要

高效推理

支持多種GPU加速方法，包括flash-attention2和torch SDPA

高質量摘要

基於BART-large架構微調，能夠生成連貫、準確的最終摘要

模型能力

文本摘要生成

長文檔處理

摘要整合

使用案例

文檔處理

長文檔摘要

將長文檔分塊處理後生成的多個摘要整合為最終摘要

生成連貫、準確的最終摘要

研究報告總結

對學術論文或研究報告進行摘要整合

提取關鍵信息形成簡潔摘要

🚀 bart-large-summary-map-reduce

這是一個文本到文本的模型，用於將分塊長文檔的摘要進行“映射 - 歸約”，合併成一個摘要。

該模型作為 textsum（或任何其他類似的長文檔摘要方法）的後處理器，其作用的詳細解釋如下：

image/png

修改自谷歌博客此處的流程圖

📚 詳細文檔

模型詳情

該模型是 facebook/bart-large 在 pszemraj/summary-map-reduce 數據集上的微調版本。它在評估集上取得了以下結果：

損失值：0.7894
所見輸入令牌數量：14258488

信息表格

屬性	詳情
模型類型	基於 facebook/bart-large 的微調模型
訓練數據	pszemraj/summary-map-reduce 數據集

🚀 快速開始

依賴安裝

本模型基於 transformers 庫，你可以使用以下命令安裝：

pip install transformers

運行示例

import torch
from transformers import pipeline

pipe = pipeline(
    "text2text-generation",
    model="pszemraj/bart-large-summary-map-reduce",
    device_map="auto",
)

# 示例文本
text = """"Sangers on a Train" is a 1950 film about a train driver, Guy Haines, who discovers his wife, Miriam, has been murdered in Metcalf, Washington, DC. The film delves into the relationship between Guy and Anne Burton, focusing on Guy's desire for Anne to marry him.
"Screentalk" is a comedy about Anne Burton and her husband, Guy Haines, who are investigating the murder of their daughter, Miriam. The plot revolves around Anne's relationship with Bruno, who has been arrested for his wife's murder. In the second set, Guy and Anne meet at a tennis court in Washington, DC, where they plan to play against each other. Hennessy and Hammond investigate the crime scene, leading to Guy's arrest.
"The Announcer's Boom Forest Hills" is a tennis game between Guy Haines and Bruno Antony, with the score six-five. In the second set, Haines leads three games to four, but his opponent, Bernard Reynolds, attacks him in the third set. Meanwhile, Anne Hennessy and Barbara Hammond are preparing for dinner at the amusement park, where Guy has been waiting for hours. A police car arrives, followed by a taxi. The boatman and detectives follow Guy through the queue, leading to the conclusion that Guy was the man responsible for the accident."""

text = """A computer implemented method of generating a syntactic object. The method includes the steps of providing a plurality of input data sets, each input data set comprising one or more words, wherein each word is associated with at least one non-adjacent second word; creating an exocentric relationship between the first and second words by applying a neo-ian event semantics to the input data in such a way that the neo-antagonistic effect results in the generation of the syntactic object; and storing the generated syntactic object for future use.
    A method of learning and using language is disclosed. The method includes the steps of creating a lexicon of words, wherein each word in the lexicon has at least two possible states, selecting a set of one or more of the possible states of the lexicon to be used as a base state for a subsequent computational operation, and applying the computational operation to the base state to form a new output state.
    A computer implemented method for changing a first workspace to a second workspace. The method includes the steps of creating a new workspace by merging the first workspace with the second workspace, wherein the merging is based on at least one of: an impenetrable condition; a constraint on movement; and a resource restriction.
    The brain is constantly loosing neurons because you doesn&#39;t want all the junk around."""

# 生成摘要
if torch.cuda.is_available():
    torch.cuda.empty_cache()
res = pipe(
    text,
    max_new_tokens=512, # 如有需要可增加至 1024
    num_beams=4,
    early_stopping=True,
    truncation=True,
)
print(res[0]["generated_text"])