🚀 bart-large-summary-map-reduce
这是一个文本到文本的模型,用于将分块长文档的摘要进行“映射 - 归约”,合并成一个摘要。
该模型作为 textsum(或任何其他类似的长文档摘要方法)的后处理器,其作用的 详细解释 如下:

修改自谷歌博客 此处 的流程图
📚 详细文档
模型详情
该模型是 facebook/bart-large 在 pszemraj/summary-map-reduce 数据集上的微调版本。
它在评估集上取得了以下结果:
- 损失值:0.7894
- 所见输入令牌数量:14258488
信息表格
属性 |
详情 |
模型类型 |
基于 facebook/bart-large 的微调模型 |
训练数据 |
pszemraj/summary-map-reduce 数据集 |
🚀 快速开始
依赖安装
本模型基于 transformers
库,你可以使用以下命令安装:
pip install transformers
运行示例
import torch
from transformers import pipeline
pipe = pipeline(
"text2text-generation",
model="pszemraj/bart-large-summary-map-reduce",
device_map="auto",
)
text = """"Sangers on a Train" is a 1950 film about a train driver, Guy Haines, who discovers his wife, Miriam, has been murdered in Metcalf, Washington, DC. The film delves into the relationship between Guy and Anne Burton, focusing on Guy's desire for Anne to marry him.
"Screentalk" is a comedy about Anne Burton and her husband, Guy Haines, who are investigating the murder of their daughter, Miriam. The plot revolves around Anne's relationship with Bruno, who has been arrested for his wife's murder. In the second set, Guy and Anne meet at a tennis court in Washington, DC, where they plan to play against each other. Hennessy and Hammond investigate the crime scene, leading to Guy's arrest.
"The Announcer's Boom Forest Hills" is a tennis game between Guy Haines and Bruno Antony, with the score six-five. In the second set, Haines leads three games to four, but his opponent, Bernard Reynolds, attacks him in the third set. Meanwhile, Anne Hennessy and Barbara Hammond are preparing for dinner at the amusement park, where Guy has been waiting for hours. A police car arrives, followed by a taxi. The boatman and detectives follow Guy through the queue, leading to the conclusion that Guy was the man responsible for the accident."""
text = """A computer implemented method of generating a syntactic object. The method includes the steps of providing a plurality of input data sets, each input data set comprising one or more words, wherein each word is associated with at least one non-adjacent second word; creating an exocentric relationship between the first and second words by applying a neo-ian event semantics to the input data in such a way that the neo-antagonistic effect results in the generation of the syntactic object; and storing the generated syntactic object for future use.
A method of learning and using language is disclosed. The method includes the steps of creating a lexicon of words, wherein each word in the lexicon has at least two possible states, selecting a set of one or more of the possible states of the lexicon to be used as a base state for a subsequent computational operation, and applying the computational operation to the base state to form a new output state.
A computer implemented method for changing a first workspace to a second workspace. The method includes the steps of creating a new workspace by merging the first workspace with the second workspace, wherein the merging is based on at least one of: an impenetrable condition; a constraint on movement; and a resource restriction.
The brain is constantly loosing neurons because you doesn't want all the junk around."""
if torch.cuda.is_available():
torch.cuda.empty_cache()
res = pipe(
text,
max_new_tokens=512,
num_beams=4,
early_stopping=True,
truncation=True,
)
print(res[0]["generated_text"])
使用提示
⚠️ 重要提示
BART 支持在 GPU 上进行推理的几种加速方法,包括 flash-attention2 和 torch SDPA
🔧 技术细节
训练超参数
训练过程中使用了以下超参数:
- 学习率:0.0001
- 训练批次大小:4
- 评估批次大小:4
- 随机种子:17868
- 梯度累积步数:16
- 总训练批次大小:64
- 优化器:使用 OptimizerNames.PAGED_ADAMW,β1=0.9,β2=0.999,ε=1e-08,无额外优化器参数
- 学习率调度器类型:余弦退火
- 学习率调度器预热比例:0.05
- 训练轮数:3.0
📄 许可证
本项目采用 Apache 2.0 许可证。