LSG - BART - Base - 16384 - MediaSum Open - source Model: Efficiently Handle Long Sequences and Easily Generate Summaries

Lsg Bart Base 16384 Mediasum

Developed by ccdv

A BART model based on LSG technology, optimized for long-sequence summarization tasks, supporting input sequences up to 16,384 tokens in length

Text Generation

Transformers

English#Long Text Summarization #Local Sparse Global Attention #Media Content Summarization

Downloads 22

Release Time : 6/23/2022

Model Overview

This model employs a local-sparse-global attention mechanism, fine-tuned on the MediaSum dataset, and is suitable for long-text summarization tasks

Model Features

Long Sequence Processing Capability

Supports input sequences up to 16,384 tokens, making it suitable for long-document summarization tasks

Hybrid Attention Mechanism

Combines local, sparse, and global attention to effectively capture long-range dependencies

Efficient Fine-Tuning

Achieves fine-tuning within 1 epoch, maintaining high performance while conserving computational resources

Model Capabilities

Long Text Summarization

Text Compression

Key Information Extraction

Use Cases

Media Content Processing

News Article Summarization

Automatically generates core content summaries of news articles

Rouge-L score 31.81

Interview Transcript Summarization

Extracts key dialogue points from lengthy interview transcripts

Rouge-1 score 35.31

🚀 ccdv/lsg-bart-base-16384-mediasum

This model is a fine - tuned version of ccdv/lsg-bart-base-4096-mediasum on the ccdv/mediasum roberta_prepended mediasum dataset, designed for text summarization.

🚀 Quick Start

Prerequisites

Transformers >= 4.36.1 This model relies on a custom modeling file, you need to add trust_remote_code=True See #13467

Code Example

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-16384-mediasum", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-16384-mediasum", trust_remote_code=True)

text = "Replace by what you want."
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0)
generated_text = pipe(
  text, 
  truncation=True, 
  max_length=64, 
  no_repeat_ngram_size=7,
  num_beams=2,
  early_stopping=True
  )

✨ Features

Long Sequence Handling: The model is converted to handle 16384 long sequences, leveraging Local - Sparse - Global attention mechanism.
Fine - Tuned: Fine - tuned on the ccdv/mediasum roberta_prepended mediasum dataset for better performance.

📚 Documentation

Model Performance

The model achieves the following results on the test set:

Length	Global tokens	Fine - tuning	Block Size	Connexions	R1	R2	RL	RLsum
16384	64	Full	256	768	35.31	18.35	31.81	32.47
16384	1	Full	256	768	35.21	18.20	31.73	32.37
16384	64	Global only	256	768	35.22	18.08	31.54	32.21
16384	1	None	256	768	35.17	18.13	31.54	32.20

Reference Model

Length	Global tokens	Fine - tuning	Block Size	Sparsity	Connexions	R1	R2	RL	RLsum
4096	1	-	256	0	768	35.16	18.13	31.54	32.20

Model Description

The model relies on Local - Sparse - Global attention to handle long sequences: attn

The model has about ~145 millions parameters (6 encoder layers - 6 decoder layers). It is warm - started from ccdv/lsg-bart-base-4096-mediasum, converted to handle long sequences (encoder only) and fine - tuned.

Intended Uses & Limitations

More information needed

Training and Evaluation Data