lsg-bart-base-4096-pubmed开源模型 - 免费部署助力科学论文摘要快速生成

首页

Lsg Bart Base 4096 Pubmed

由 ccdv 开发

基于LSG注意力机制的长序列处理模型，专为科学论文摘要生成任务微调

文本生成

Transformers

英语#长文本摘要 #科学论文处理 #局部稀疏全局注意力

下载量 21

发布时间 : 5/9/2022

模型简介

该模型是BART-base的改进版本，采用局部-稀疏-全局注意力机制处理长序列输入，在PubMed科学论文数据集上微调，适用于长文本摘要生成任务。

模型特点

长序列处理能力

支持最长4096 tokens的输入序列，采用局部-稀疏-全局注意力机制高效处理长文本

多种注意力模式

提供局部、池化、跨步、块跨步、归一化和LSH等多种稀疏注意力模式选择

科学论文优化

在PubMed科学论文数据集上专门微调，适合学术文本摘要生成

模型能力

长文本处理

科学论文摘要生成

序列到序列转换

使用案例

学术研究

科学论文自动摘要

为长篇科研论文生成简洁准确的摘要

在PubMed测试集上ROUGE-1得分47.37

文献处理

医学文献摘要

处理医学领域的长篇研究文献，提取关键信息

🚀 ccdv/lsg-bart-base-4096-pubmed

该模型是基于 scientific_papers pubmed 数据集对 ccdv/lsg-bart-base-4096 进行微调后的版本。它利用 Local-Sparse-Global 注意力机制处理长序列，能在文本摘要等任务中取得较好效果。

⚠️ 重要提示

此模型依赖自定义建模文件，需要添加 trust_remote_code=True。请确保使用的 Transformers >= 4.36.1，详见 #13467。

🚀 快速开始

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-4096-pubmed", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-4096-pubmed", trust_remote_code=True)

text = "Replace by what you want."
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0)
generated_text = pipe(
  text, 
  truncation=True, 
  max_length=64, 
  no_repeat_ngram_size=7,
  num_beams=2,
  early_stopping=True
  )

✨ 主要特性

长序列处理：该模型依赖 Local-Sparse-Global 注意力机制来处理长序列，其架构如图所示：
参数规模：模型约有 1.45 亿个参数，包含 6 个编码器层和 6 个解码器层。
微调基础：模型从 BART-base 进行热启动，转换为处理长序列（仅编码器）并进行微调。

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-4096-pubmed", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-4096-pubmed", trust_remote_code=True)

text = "Replace by what you want."
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0)
generated_text = pipe(
  text, 
  truncation=True, 
  max_length=64, 
  no_repeat_ngram_size=7,
  num_beams=2,
  early_stopping=True
  )

📚 详细文档

该模型在测试集上取得了以下结果：

较大块大小

长度	稀疏类型	块大小	稀疏度	连接数	R1	R2	RL	RLsum
4096	Local	256	0	768	47.37	21.74	28.59	43.67
4096	Local	128	0	384	47.02	21.33	28.34	43.31
4096	Pooling	128	4	644	47.11	21.42	28.43	43.40
4096	Stride	128	4	644	47.16	21.49	28.38	43.44
4096	Block Stride	128	4	644	47.13	21.46	28.39	43.42
4096	Norm	128	4	644	47.09	21.44	28.40	43.36
4096	LSH	128	4	644	47.11	21.41	28.41	43.42

较小块大小（资源需求较低）

长度	稀疏类型	块大小	稀疏度	连接数	R1	R2	RL	RLsum
4096	Local	64	0	192	45.74	20.26	27.51	41.99
4096	Local	32	0	96	42.69	17.83	25.62	38.89
4096	Pooling	32	4	160	44.60	19.35	26.83	40.85
4096	Stride	32	4	160	45.52	20.07	27.39	41.75
4096	Block Stride	32	4	160	45.30	19.89	27.22	41.54
4096	Norm	32	4	160	44.30	19.05	26.57	40.47
4096	LSH	32	4	160	44.53	19.27	26.84	40.74