🚀 ccdv/lsg-bart-base-16384-mediasum
This model is a fine - tuned version of ccdv/lsg-bart-base-4096-mediasum on the ccdv/mediasum roberta_prepended mediasum dataset, designed for text summarization.
🚀 Quick Start
Prerequisites
Transformers >= 4.36.1
This model relies on a custom modeling file, you need to add trust_remote_code=True
See #13467
Code Example
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-16384-mediasum", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-16384-mediasum", trust_remote_code=True)
text = "Replace by what you want."
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0)
generated_text = pipe(
text,
truncation=True,
max_length=64,
no_repeat_ngram_size=7,
num_beams=2,
early_stopping=True
)
✨ Features
- Long Sequence Handling: The model is converted to handle 16384 long sequences, leveraging Local - Sparse - Global attention mechanism.
- Fine - Tuned: Fine - tuned on the ccdv/mediasum roberta_prepended mediasum dataset for better performance.
📚 Documentation
Model Performance
The model achieves the following results on the test set:
Length |
Global tokens |
Fine - tuning |
Block Size |
Sparsity |
Connexions |
R1 |
R2 |
RL |
RLsum |
16384 |
64 |
Full |
256 |
0 |
768 |
35.31 |
18.35 |
31.81 |
32.47 |
16384 |
1 |
Full |
256 |
0 |
768 |
35.21 |
18.20 |
31.73 |
32.37 |
16384 |
64 |
Global only |
256 |
0 |
768 |
35.22 |
18.08 |
31.54 |
32.21 |
16384 |
1 |
None |
256 |
0 |
768 |
35.17 |
18.13 |
31.54 |
32.20 |
Reference Model
Length |
Global tokens |
Fine - tuning |
Block Size |
Sparsity |
Connexions |
R1 |
R2 |
RL |
RLsum |
4096 |
1 |
- |
256 |
0 |
768 |
35.16 |
18.13 |
31.54 |
32.20 |
Model Description
The model relies on Local - Sparse - Global attention to handle long sequences:

The model has about ~145 millions parameters (6 encoder layers - 6 decoder layers). It is warm - started from ccdv/lsg-bart-base-4096-mediasum, converted to handle long sequences (encoder only) and fine - tuned.
Intended Uses & Limitations
More information needed
Training and Evaluation Data
More information needed
Training Procedure
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e - 05
- train_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0
Generate Hyperparameters
The following hyperparameters were used during generation:
- dataset_name: ccdv/mediasum
- dataset_config_name: roberta_prepended
- eval_batch_size: 8
- eval_samples: 10000
- early_stopping: True
- ignore_pad_token_for_loss: True
- length_penalty: 2.0
- max_length: 128
- min_length: 3
- num_beams: 5
- no_repeat_ngram_size: None
- seed: 123
Framework Versions
- Transformers 4.18.0
- Pytorch 1.10.1+cu102
- Datasets 2.1.0
- Tokenizers 0.11.6
Related Papers and Links
LSG ArXiv paper.
Github/conversion script is available at this link.