lsg-bart-base-16384-arxiv Open-source Model - Free Processing of Long Texts to Assist in Scientific Paper Abstract Generation

Lsg Bart Base 16384 Arxiv

Developed by ccdv

A long-sequence processing model based on the BART architecture, optimized for scientific paper summarization tasks, supporting long-text input up to 16,384 tokens

Text Generation

Transformers

English#Long Text Summarization #Scientific Paper Processing #Local Sparse Global Attention

Downloads 29

Release Time : 5/9/2022

Model Overview

This model employs a local-sparse-global attention mechanism to handle long sequences, fine-tuned on the scientific_papers arxiv dataset, suitable for scientific paper summarization tasks

Model Features

Long Sequence Processing Capability

Supports long-text input up to 16,384 tokens, suitable for processing complete scientific papers

Local-Sparse-Global Attention Mechanism

An innovative combination of attention mechanisms that effectively balances computational efficiency and model performance for long-sequence processing

Scientific Paper Optimization

Specifically fine-tuned on the arxiv scientific papers dataset, optimized for academic text summarization tasks

Model Capabilities

Long Text Summarization

Scientific Paper Comprehension

Academic Text Processing

Use Cases

Academic Research

Automatic Scientific Paper Summarization

Generates concise and accurate summaries for lengthy scientific papers

ROUGE-1: 48.74, ROUGE-2: 20.88, ROUGE-L: 28.50

Academic Literature Processing

Processing and analyzing long-form academic literature content

🚀 ccdv/lsg-bart-base-16384-arxiv

This model is a fine - tuned version of ccdv/lsg-bart-base-4096-arxiv on the scientific_papers arxiv dataset. It can handle 16384 long sequences and has achieved certain results on the test set.

🚀 Quick Start

Transformers >= 4.36.1 This model relies on a custom modeling file, you need to add trust_remote_code=True See #13467

LSG ArXiv paper. Github/conversion script is available at this link.

Basic Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-16384-arxiv", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-16384-arxiv", trust_remote_code=True)

text = "Replace by what you want."
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0)
generated_text = pipe(
  text, 
  truncation=True, 
  max_length=64, 
  no_repeat_ngram_size=7,
  num_beams=2,
  early_stopping=True
  )

✨ Features

This model is a fine - tuned version of ccdv/lsg-bart-base-4096-arxiv on the scientific_papers arxiv dataset.
It is converted to handle 16384 long sequences and fine - tuned accordingly during 1 epoch.

📚 Documentation

Model Details

This model is a fine - tuned version of ccdv/lsg-bart-base-4096-arxiv on the scientific_papers arxiv dataset. The model is converted to handle 16384 long sequences and fine - tuned accordingly during 1 epoch. It achieves the following results on the test set:

Length	Global tokens	Fine - tuning	Block Size	Connexions	R1	R2	RL	RLsum
16384	64	Full	256	768	48.74	20.88	28.50	44.23
16384	1	Full	256	768	48.66	20.92	28.50	44.18
16384	64	Global only	256	768	48.08	20.42	28.00	43.65
16384	1	None	256	768	47.03	20.19	28.26	42.69

Reference model:

Length	Global tokens	Fine - tuning	Block Size	Sparsity	Connexions	R1	R2	RL	RLsum
4096	1	-	256	0	768	46.65	18.91	26.90	42.18

Model description

The model relies on Local - Sparse - Global attention to handle long sequences: attn

The model has about ~145 millions parameters (6 encoder layers - 6 decoder layers). The model is warm - started from ccdv/lsg-bart-base-4096-arxiv, converted to handle long sequences (encoder only) and fine - tuned.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e - 05
train_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1.0

Generate hyperparameters

The following hyperparameters were used during generation:

dataset_name: scientific_papers
dataset_config_name: arxiv
eval_batch_size: 4
eval_samples: 6440
early_stopping: True
ignore_pad_token_for_loss: True
length_penalty: 2.0
max_length: 320
min_length: 32
num_beams: 5
no_repeat_ngram_size: None
seed: 123

Framework versions

Transformers 4.18.0
Pytorch 1.10.1+cu102
Datasets 2.1.0
Tokenizers 0.11.6

Property	Details
Model Type	Fine - tuned version of ccdv/lsg - bart - base - 4096 - arxiv
Training Data	scientific_papers arxiv

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご