lsg-bart-base-4096-pubmed開源模型 - 免費部署助力科學論文摘要快速生成

首頁

Lsg Bart Base 4096 Pubmed

由ccdv開發

基於LSG注意力機制的長序列處理模型，專為科學論文摘要生成任務微調

文本生成

Transformers

英語#長文本摘要 #科學論文處理 #局部稀疏全局注意力

下載量 21

發布時間 : 5/9/2022

模型概述

該模型是BART-base的改進版本，採用局部-稀疏-全局注意力機制處理長序列輸入，在PubMed科學論文數據集上微調，適用於長文本摘要生成任務。

模型特點

長序列處理能力

支持最長4096 tokens的輸入序列，採用局部-稀疏-全局注意力機制高效處理長文本

多種注意力模式

提供局部、池化、跨步、塊跨步、歸一化和LSH等多種稀疏注意力模式選擇

科學論文優化

在PubMed科學論文數據集上專門微調，適合學術文本摘要生成

模型能力

長文本處理

科學論文摘要生成

序列到序列轉換

使用案例

學術研究

科學論文自動摘要

為長篇科研論文生成簡潔準確的摘要

在PubMed測試集上ROUGE-1得分47.37

文獻處理

醫學文獻摘要

處理醫學領域的長篇研究文獻，提取關鍵信息

🚀 ccdv/lsg-bart-base-4096-pubmed

該模型是基於 scientific_papers pubmed 數據集對 ccdv/lsg-bart-base-4096 進行微調後的版本。它利用 Local-Sparse-Global 注意力機制處理長序列，能在文本摘要等任務中取得較好效果。

⚠️ 重要提示

此模型依賴自定義建模文件，需要添加 trust_remote_code=True。請確保使用的 Transformers >= 4.36.1，詳見 #13467。

🚀 快速開始

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-4096-pubmed", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-4096-pubmed", trust_remote_code=True)

text = "Replace by what you want."
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0)
generated_text = pipe(
  text, 
  truncation=True, 
  max_length=64, 
  no_repeat_ngram_size=7,
  num_beams=2,
  early_stopping=True
  )

✨ 主要特性

長序列處理：該模型依賴 Local-Sparse-Global 注意力機制來處理長序列，其架構如圖所示：
參數規模：模型約有 1.45 億個參數，包含 6 個編碼器層和 6 個解碼器層。
微調基礎：模型從 BART-base 進行熱啟動，轉換為處理長序列（僅編碼器）並進行微調。

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-4096-pubmed", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-4096-pubmed", trust_remote_code=True)

text = "Replace by what you want."
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0)
generated_text = pipe(
  text, 
  truncation=True, 
  max_length=64, 
  no_repeat_ngram_size=7,
  num_beams=2,
  early_stopping=True
  )

📚 詳細文檔

該模型在測試集上取得了以下結果：

較大塊大小

長度	稀疏類型	塊大小	稀疏度	連接數	R1	R2	RL	RLsum
4096	Local	256	0	768	47.37	21.74	28.59	43.67
4096	Local	128	0	384	47.02	21.33	28.34	43.31
4096	Pooling	128	4	644	47.11	21.42	28.43	43.40
4096	Stride	128	4	644	47.16	21.49	28.38	43.44
4096	Block Stride	128	4	644	47.13	21.46	28.39	43.42
4096	Norm	128	4	644	47.09	21.44	28.40	43.36
4096	LSH	128	4	644	47.11	21.41	28.41	43.42

較小塊大小（資源需求較低）

長度	稀疏類型	塊大小	稀疏度	連接數	R1	R2	RL	RLsum
4096	Local	64	0	192	45.74	20.26	27.51	41.99
4096	Local	32	0	96	42.69	17.83	25.62	38.89
4096	Pooling	32	4	160	44.60	19.35	26.83	40.85
4096	Stride	32	4	160	45.52	20.07	27.39	41.75
4096	Block Stride	32	4	160	45.30	19.89	27.22	41.54
4096	Norm	32	4	160	44.30	19.05	26.57	40.47
4096	LSH	32	4	160	44.53	19.27	26.84	40.74