lsg-bart-large-4096 Open-source Long Text Processing Model - Efficiently Handle Various Long Text Tasks

Lsg Bart Large 4096

Developed by ccdv

The LSG model is an improved long-sequence processing model based on BART-large, utilizing local + sparse + global attention mechanisms for efficient handling of long-text tasks

Text Generation

Transformers

English#Long Text Summarization #Local Sparse Global Attention #Efficient Sequence Processing

Downloads 15

Release Time : 3/2/2022

Model Overview

This model is optimized for encoder-decoder tasks, capable of efficiently processing long-sequence inputs with faster speed and higher efficiency compared to traditional long-sequence models

Model Features

Efficient Long-Sequence Processing

Utilizes local + sparse + global attention mechanisms (LSG), significantly improving long-text processing efficiency

Adaptive Sequence Length

Supports automatic padding of sequence length to integer multiples of chunk size, ensuring processing stability

Multi-Mode Sparse Selection

Provides 6 sparse selection modes (e.g., BOS pooling, LSH clustering, etc.) to adapt to different task requirements

Compatible with Original Architecture

Maintains the same parameter scale and layer count as BART-large, sharing the same tokenizer

Model Capabilities

Long Text Summarization

Sequence-to-Sequence Conversion

Efficient Processing of 4096-Length Inputs

Text Classification

Use Cases

Text Summarization

Automatic Summarization of Long Documents

Generates precise summaries for research papers, long articles, and other lengthy texts

Significantly improved processing speed compared to traditional models

Text Processing

Long Text Classification

Performs classification tasks on lengthy documents

Maintains high accuracy while reducing memory consumption

🚀 LSG model

This is an LSG model, which is adapted from BART-large for encoder - decoder tasks without additional pretraining. It can handle long sequences more efficiently than some other models, relying on Local + Sparse + Global attention (LSG).

Prerequisites

Transformers >= 4.36.1
This model relies on a custom modeling file, you need to add trust_remote_code=True
See #13467

Model Features

This model uses the same number of parameters/layers and the same tokenizer as BART - large. It can handle long sequences faster and more efficiently than Longformer (LED) or BigBird (Pegasus) from the hub. The model requires sequences whose length is a multiple of the block size. It is "adaptive" and can automatically pad the sequences if needed (adaptive=True in config). However, it is recommended to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...). It is implemented in PyTorch.

attn

🚀 Quick Start

Usage

The model relies on a custom modeling file, you need to add trust_remote_code=True to use it.

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("ccdv/lsg-bart-large-4096", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-large-4096")

Parameters

You can change various parameters like:

the number of global tokens (num_global_tokens=1)
local block size (block_size=128)
sparse block size (sparse_block_size=128)
sparsity factor (sparsity_factor=2)
mask_first_token (mask first token since it is redundant with the first global token)
see config.json file

Default parameters work well in practice. If you are short on memory, reduce block sizes, increase sparsity factor and remove dropout in the attention score matrix.

from transformers import AutoModel

model = AutoModel.from_pretrained("ccdv/lsg-bart-large-4096", 
    trust_remote_code=True, 
    num_global_tokens=16,
    block_size=64,
    sparse_block_size=64,
    attention_probs_dropout_prob=0.0,
    sparsity_factor=4,
    sparsity_type="none",
    mask_first_token=True
)

Sparse selection type

There are 6 different sparse selection patterns. The best type is task - dependent. If sparse_block_size = 0 or sparsity_type="none", only local attention is considered. Note that for sequences with length < 2*block_size, the type has no effect.

sparsity_type="bos_pooling" (new)
- weighted average pooling using the BOS token
- Works best in general, especially with a rather large sparsity_factor (8, 16, 32)
- Additional parameters: None
sparsity_type="norm", select highest norm tokens
- Works best for a small sparsity_factor (2 to 4)
- Additional parameters: None
sparsity_type="pooling", use average pooling to merge tokens
- Works best for a small sparsity_factor (2 to 4)
- Additional parameters: None
sparsity_type="lsh", use the LSH algorithm to cluster similar tokens
- Works best for a large sparsity_factor (4+)
- LSH relies on random projections, thus inference may differ slightly with different seeds
- Additional parameters: lsg_num_pre_rounds=1, pre merge tokens n times before computing centroids
sparsity_type="stride", use a striding mecanism per head
- Each head will use different tokens strided by sparsify_factor
- Not recommended if sparsify_factor > num_heads
sparsity_type="block_stride", use a striding mecanism per head
- Each head will use block of tokens strided by sparsify_factor
- Not recommended if sparsify_factor > num_heads

Tasks

Seq2Seq example for summarization

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-large-4096", 
    trust_remote_code=True, 
    pass_global_tokens_to_decoder=True, # Pass encoder global tokens to decoder
)
tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-large-4096")

SENTENCE = "This is a test sequence to test the model. " * 300
token_ids = tokenizer(
    SENTENCE, 
    return_tensors="pt", 
    #pad_to_multiple_of=... # Optional
    truncation=True
    )
output = model(**token_ids)

Classification example

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("ccdv/lsg-bart-large-4096", 
    trust_remote_code=True, 
    pass_global_tokens_to_decoder=True, # Pass encoder global tokens to decoder
)
tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-large-4096")

SENTENCE = "This is a test sequence to test the model. " * 300
token_ids = tokenizer(
    SENTENCE, 
    return_tensors="pt", 
    padding="max_length", # Optional but recommended
    truncation=True # Optional but recommended
    )
output = model(**token_ids)

> SequenceClassifierOutput(loss=None, logits=tensor([[-0.3051, -0.1762]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)

Citation

BART

@article{DBLP:journals/corr/abs-1910-13461,
  author    = {Mike Lewis and
               Yinhan Liu and
               Naman Goyal and
               Marjan Ghazvininejad and
               Abdelrahman Mohamed and
               Omer Levy and
               Veselin Stoyanov and
               Luke Zettlemoyer},
  title     = {{BART:} Denoising Sequence-to-Sequence Pre-training for Natural Language
               Generation, Translation, and Comprehension},
  journal   = {CoRR},
  volume    = {abs/1910.13461},
  year      = {2019},
  url       = {http://arxiv.org/abs/1910.13461},
  eprinttype = {arXiv},
  eprint    = {1910.13461},
  timestamp = {Thu, 31 Oct 2019 14:02:26 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1910-13461.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Lsg Bart Large 4096

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 LSG model

Prerequisites

Related Links

Table of Contents

Model Features

🚀 Quick Start

Usage

Parameters

Sparse selection type

Tasks

Seq2Seq example for summarization

Classification example

Citation