đ LSG Model
This LSG model is a small - scale version of the LEGAL - BERT model. It can handle long sequences more efficiently than Longformer or BigBird, relying on Local + Sparse + Global attention (LSG).
Key Information
- Transformers Version: >= 4.36.1
- Custom Modeling File: This model relies on a custom modeling file, you need to add
trust_remote_code=True
. See #13467.
- ArXiv Paper: LSG ArXiv paper
- Github/conversion Script: Available at this link
Table of Contents
Model Introduction
This model, without additional pretraining yet, uses the same number of parameters/layers and the same tokenizer as the LEGAL - BERT model. It can handle long sequences, and is faster and more efficient than Longformer or BigBird from Transformers. The model requires sequences whose length is a multiple of the block size. It is "adaptive" and can automatically pad the sequences if needed (adaptive=True
in config). However, it is recommended to truncate the inputs (truncation=True
) and optionally pad with a multiple of the block size (pad_to_multiple_of=...
). It supports encoder - decoder, though not extensively tested, and is implemented in PyTorch.

đ Quick Start
The model relies on a custom modeling file, so you need to add trust_remote_code=True
to use it.
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("ccdv/legal-lsg-small-uncased-4096", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ccdv/legal-lsg-small-uncased-4096")
⨠Features
- Can handle long sequences more efficiently than Longformer or BigBird.
- Uses Local + Sparse + Global attention (LSG).
- "Adaptive" sequence padding.
đĻ Installation
No specific installation steps other than the requirements mentioned above (Transformers >= 4.36.1
and trust_remote_code=True
).
đģ Usage Examples
Basic Usage
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("ccdv/legal-lsg-small-uncased-4096", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ccdv/legal-lsg-small-uncased-4096")
Advanced Usage
from transformers import AutoModel
model = AutoModel.from_pretrained("ccdv/legal-lsg-small-uncased-4096",
trust_remote_code=True,
num_global_tokens=16,
block_size=64,
sparse_block_size=64,
attention_probs_dropout_prob=0.0,
sparsity_factor=4,
sparsity_type="none",
mask_first_token=True
)
đ Documentation
Parameters
You can change various parameters like:
num_global_tokens
: Number of global tokens (default: 1).
block_size
: Local block size (default: 128).
sparse_block_size
: Sparse block size (default: 128).
sparsity_factor
: Sparsity factor (default: 2).
mask_first_token
: Mask the first token since it is redundant with the first global token.
- See
config.json
file for more details.
Default parameters work well in practice. If you are short on memory, reduce block sizes, increase sparsity factor and remove dropout in the attention score matrix.
Sparse Selection Type
There are 6 different sparse selection patterns. The best type is task - dependent. If sparse_block_size = 0
or sparsity_type = "none"
, only local attention is considered. Note that for sequences with length < 2 * block_size, the type has no effect.
sparsity_type="bos_pooling"
(new):
- Weighted average pooling using the BOS token.
- Works best in general, especially with a rather large sparsity_factor (8, 16, 32).
- Additional parameters: None.
sparsity_type="norm"
: Select highest norm tokens.
- Works best for a small sparsity_factor (2 to 4).
- Additional parameters: None.
sparsity_type="pooling"
: Use average pooling to merge tokens.
- Works best for a small sparsity_factor (2 to 4).
- Additional parameters: None.
sparsity_type="lsh"
: Use the LSH algorithm to cluster similar tokens.
- Works best for a large sparsity_factor (4+).
- LSH relies on random projections, thus inference may differ slightly with different seeds.
- Additional parameters:
lsg_num_pre_rounds = 1
, pre - merge tokens n times before computing centroids.
sparsity_type="stride"
: Use a striding mechanism per head.
- Each head will use different tokens strided by sparsify_factor.
- Not recommended if sparsify_factor > num_heads.
sparsity_type="block_stride"
: Use a striding mechanism per head.
- Each head will use block of tokens strided by sparsify_factor.
- Not recommended if sparsify_factor > num_heads.
Tasks
Fill Mask Example
from transformers import FillMaskPipeline, AutoModelForMaskedLM, AutoTokenizer
model = AutoModelForMaskedLM.from_pretrained("ccdv/legal-lsg-small-uncased-4096", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ccdv/legal-lsg-small-uncased-4096")
SENTENCES = ["Paris is the <mask> of France.", "The goal of life is <mask>."]
pipeline = FillMaskPipeline(model, tokenizer)
output = pipeline(SENTENCES, top_k=1)
output = [o[0]["sequence"] for o in output]
Classification Example
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("ccdv/legal-lsg-small-uncased-4096",
trust_remote_code=True,
pool_with_global=True,
)
tokenizer = AutoTokenizer.from_pretrained("ccdv/legal-lsg-small-uncased-4096")
SENTENCE = "This is a test for sequence classification. " * 300
token_ids = tokenizer(
SENTENCE,
return_tensors="pt",
truncation=True
)
output = model(**token_ids)
Training Global Tokens
To train global tokens and the classification head only:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("ccdv/legal-lsg-small-uncased-4096",
trust_remote_code=True,
pool_with_global=True,
num_global_tokens=16
)
tokenizer = AutoTokenizer.from_pretrained("ccdv/legal-lsg-small-uncased-4096")
for name, param in model.named_parameters():
if "global_embeddings" not in name:
param.requires_grad = False
else:
param.required_grad = True
đ License
LEGAL - BERT Citation
@inproceedings{chalkidis-etal-2020-legal,
title = "{LEGAL}-{BERT}: The Muppets straight out of Law School",
author = "Chalkidis, Ilias and
Fergadiotis, Manos and
Malakasiotis, Prodromos and
Aletras, Nikolaos and
Androutsopoulos, Ion",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
doi = "10.18653/v1/2020.findings-emnlp.261",
pages = "2898--2904"
}