🚀 bart-large-summary-map-reduce
這是一個文本到文本的模型,用於將分塊長文檔的摘要進行“映射 - 歸約”,合併成一個摘要。
該模型作為 textsum(或任何其他類似的長文檔摘要方法)的後處理器,其作用的 詳細解釋 如下:

修改自谷歌博客 此處 的流程圖
📚 詳細文檔
模型詳情
該模型是 facebook/bart-large 在 pszemraj/summary-map-reduce 數據集上的微調版本。
它在評估集上取得了以下結果:
- 損失值:0.7894
- 所見輸入令牌數量:14258488
信息表格
屬性 |
詳情 |
模型類型 |
基於 facebook/bart-large 的微調模型 |
訓練數據 |
pszemraj/summary-map-reduce 數據集 |
🚀 快速開始
依賴安裝
本模型基於 transformers
庫,你可以使用以下命令安裝:
pip install transformers
運行示例
import torch
from transformers import pipeline
pipe = pipeline(
"text2text-generation",
model="pszemraj/bart-large-summary-map-reduce",
device_map="auto",
)
text = """"Sangers on a Train" is a 1950 film about a train driver, Guy Haines, who discovers his wife, Miriam, has been murdered in Metcalf, Washington, DC. The film delves into the relationship between Guy and Anne Burton, focusing on Guy's desire for Anne to marry him.
"Screentalk" is a comedy about Anne Burton and her husband, Guy Haines, who are investigating the murder of their daughter, Miriam. The plot revolves around Anne's relationship with Bruno, who has been arrested for his wife's murder. In the second set, Guy and Anne meet at a tennis court in Washington, DC, where they plan to play against each other. Hennessy and Hammond investigate the crime scene, leading to Guy's arrest.
"The Announcer's Boom Forest Hills" is a tennis game between Guy Haines and Bruno Antony, with the score six-five. In the second set, Haines leads three games to four, but his opponent, Bernard Reynolds, attacks him in the third set. Meanwhile, Anne Hennessy and Barbara Hammond are preparing for dinner at the amusement park, where Guy has been waiting for hours. A police car arrives, followed by a taxi. The boatman and detectives follow Guy through the queue, leading to the conclusion that Guy was the man responsible for the accident."""
text = """A computer implemented method of generating a syntactic object. The method includes the steps of providing a plurality of input data sets, each input data set comprising one or more words, wherein each word is associated with at least one non-adjacent second word; creating an exocentric relationship between the first and second words by applying a neo-ian event semantics to the input data in such a way that the neo-antagonistic effect results in the generation of the syntactic object; and storing the generated syntactic object for future use.
A method of learning and using language is disclosed. The method includes the steps of creating a lexicon of words, wherein each word in the lexicon has at least two possible states, selecting a set of one or more of the possible states of the lexicon to be used as a base state for a subsequent computational operation, and applying the computational operation to the base state to form a new output state.
A computer implemented method for changing a first workspace to a second workspace. The method includes the steps of creating a new workspace by merging the first workspace with the second workspace, wherein the merging is based on at least one of: an impenetrable condition; a constraint on movement; and a resource restriction.
The brain is constantly loosing neurons because you doesn't want all the junk around."""
if torch.cuda.is_available():
torch.cuda.empty_cache()
res = pipe(
text,
max_new_tokens=512,
num_beams=4,
early_stopping=True,
truncation=True,
)
print(res[0]["generated_text"])
使用提示
⚠️ 重要提示
BART 支持在 GPU 上進行推理的幾種加速方法,包括 flash-attention2 和 torch SDPA
🔧 技術細節
訓練超參數
訓練過程中使用了以下超參數:
- 學習率:0.0001
- 訓練批次大小:4
- 評估批次大小:4
- 隨機種子:17868
- 梯度累積步數:16
- 總訓練批次大小:64
- 優化器:使用 OptimizerNames.PAGED_ADAMW,β1=0.9,β2=0.999,ε=1e-08,無額外優化器參數
- 學習率調度器類型:餘弦退火
- 學習率調度器預熱比例:0.05
- 訓練輪數:3.0
📄 許可證
本項目採用 Apache 2.0 許可證。