lsg-legal-small-uncased-4096 Open Source Model - Essential for Efficiently Processing Long Legal Sequential Texts

Lsg Legal Small Uncased 4096

Developed by ccdv

A compact version of LEGAL-BERT, employing Local+Sparse+Global attention mechanism (LSG) for efficient long-sequence processing

Large Language Model

Transformers

English#Long Text Processing #Legal Domain #Sparse Attention

Downloads 1,088

Release Time : 3/2/2022

Model Overview

This model is a compact version of LEGAL-BERT, specifically optimized for processing long legal text sequences. It utilizes an innovative Local+Sparse+Global attention mechanism (LSG), outperforming traditional long-sequence models like Longformer or BigBird in both speed and performance.

Model Features

Efficient Long-Sequence Processing

Utilizes LSG attention mechanism to efficiently process sequences up to 4096 tokens, outperforming traditional long-sequence models

Flexible Configuration

Supports adjustment of global tokens, block size, sparse factor, and other parameters to adapt to different task requirements

Multiple Sparse Patterns

Offers 6 sparse selection types (bos_pooling/norm/pooling/lsh/stride/block_stride) for different scenarios

Adaptive Padding

Automatically pads sequences shorter than block size, recommended to be used with tokenizer truncation and padding functions

Model Capabilities

Long text processing

Legal text analysis

Masked language modeling

Sequence classification

Use Cases

Legal Text Processing

Legal Document Classification

Automatic classification of lengthy legal documents

Capable of processing document sequences up to 4096 tokens

Legal Term Prediction

Predicting missing terms in legal texts

Examples demonstrate accurate prediction of terms like 'capital' and 'happiness'

General NLP Tasks

Long Text Classification

Handling classification tasks requiring long-context understanding

Model outputs include classification logits

🚀 LSG Model

This LSG model is a small - scale version of the LEGAL - BERT model. It can handle long sequences more efficiently than Longformer or BigBird, relying on Local + Sparse + Global attention (LSG).

Key Information

Transformers Version: >= 4.36.1
Custom Modeling File: This model relies on a custom modeling file, you need to add trust_remote_code=True. See #13467.
ArXiv Paper: LSG ArXiv paper
Github/conversion Script: Available at this link

Usage
Parameters
Sparse Selection Type
Tasks
Training Global Tokens

Model Introduction

This model, without additional pretraining yet, uses the same number of parameters/layers and the same tokenizer as the LEGAL - BERT model. It can handle long sequences, and is faster and more efficient than Longformer or BigBird from Transformers. The model requires sequences whose length is a multiple of the block size. It is "adaptive" and can automatically pad the sequences if needed (adaptive=True in config). However, it is recommended to truncate the inputs (truncation=True) and optionally pad with a multiple of the block size (pad_to_multiple_of=...). It supports encoder - decoder, though not extensively tested, and is implemented in PyTorch.

attn

🚀 Quick Start

The model relies on a custom modeling file, so you need to add trust_remote_code=True to use it.

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("ccdv/legal-lsg-small-uncased-4096", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ccdv/legal-lsg-small-uncased-4096")

✨ Features

Can handle long sequences more efficiently than Longformer or BigBird.
Uses Local + Sparse + Global attention (LSG).
"Adaptive" sequence padding.

📦 Installation

No specific installation steps other than the requirements mentioned above (Transformers >= 4.36.1 and trust_remote_code=True).

💻 Usage Examples

Basic Usage

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("ccdv/legal-lsg-small-uncased-4096", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ccdv/legal-lsg-small-uncased-4096")

Advanced Usage

from transformers import AutoModel

model = AutoModel.from_pretrained("ccdv/legal-lsg-small-uncased-4096", 
    trust_remote_code=True, 
    num_global_tokens=16,
    block_size=64,
    sparse_block_size=64,
    attention_probs_dropout_prob=0.0,
    sparsity_factor=4,
    sparsity_type="none",
    mask_first_token=True
)

📚 Documentation

Parameters

You can change various parameters like:

num_global_tokens: Number of global tokens (default: 1).
block_size: Local block size (default: 128).
sparse_block_size: Sparse block size (default: 128).
sparsity_factor: Sparsity factor (default: 2).
mask_first_token: Mask the first token since it is redundant with the first global token.
See config.json file for more details.

Default parameters work well in practice. If you are short on memory, reduce block sizes, increase sparsity factor and remove dropout in the attention score matrix.

Sparse Selection Type

There are 6 different sparse selection patterns. The best type is task - dependent. If sparse_block_size = 0 or sparsity_type = "none", only local attention is considered. Note that for sequences with length < 2 * block_size, the type has no effect.

sparsity_type="bos_pooling" (new):
- Weighted average pooling using the BOS token.
- Works best in general, especially with a rather large sparsity_factor (8, 16, 32).
- Additional parameters: None.
sparsity_type="norm": Select highest norm tokens.
- Works best for a small sparsity_factor (2 to 4).
- Additional parameters: None.
sparsity_type="pooling": Use average pooling to merge tokens.
- Works best for a small sparsity_factor (2 to 4).
- Additional parameters: None.
sparsity_type="lsh": Use the LSH algorithm to cluster similar tokens.
- Works best for a large sparsity_factor (4+).
- LSH relies on random projections, thus inference may differ slightly with different seeds.
- Additional parameters: lsg_num_pre_rounds = 1, pre - merge tokens n times before computing centroids.
sparsity_type="stride": Use a striding mechanism per head.
- Each head will use different tokens strided by sparsify_factor.
- Not recommended if sparsify_factor > num_heads.
sparsity_type="block_stride": Use a striding mechanism per head.
- Each head will use block of tokens strided by sparsify_factor.
- Not recommended if sparsify_factor > num_heads.

Tasks

Fill Mask Example

from transformers import FillMaskPipeline, AutoModelForMaskedLM, AutoTokenizer

model = AutoModelForMaskedLM.from_pretrained("ccdv/legal-lsg-small-uncased-4096", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ccdv/legal-lsg-small-uncased-4096")

SENTENCES = ["Paris is the <mask> of France.", "The goal of life is <mask>."]
pipeline = FillMaskPipeline(model, tokenizer)
output = pipeline(SENTENCES, top_k=1)
    
output = [o[0]["sequence"] for o in output]
# Output: ['Paris is the capital of France.', 'The goal of life is happiness.']

Classification Example

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("ccdv/legal-lsg-small-uncased-4096", 
    trust_remote_code=True, 
    pool_with_global=True, # pool with a global token instead of first token
)
tokenizer = AutoTokenizer.from_pretrained("ccdv/legal-lsg-small-uncased-4096")

SENTENCE = "This is a test for sequence classification. " * 300
token_ids = tokenizer(
    SENTENCE, 
    return_tensors="pt", 
    #pad_to_multiple_of=... # Optional
    truncation=True
    )
output = model(**token_ids)

# Output: SequenceClassifierOutput(loss=None, logits=tensor([[-0.3051, -0.1762]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)

Training Global Tokens

To train global tokens and the classification head only:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("ccdv/legal-lsg-small-uncased-4096", 
    trust_remote_code=True, 
    pool_with_global=True, # pool with a global token instead of first token
    num_global_tokens=16
)
tokenizer = AutoTokenizer.from_pretrained("ccdv/legal-lsg-small-uncased-4096")

for name, param in model.named_parameters():
    if "global_embeddings" not in name:
        param.requires_grad = False
    else:
        param.required_grad = True

📄 License

LEGAL - BERT Citation

@inproceedings{chalkidis-etal-2020-legal,
    title = "{LEGAL}-{BERT}: The Muppets straight out of Law School",
    author = "Chalkidis, Ilias  and
      Fergadiotis, Manos  and
      Malakasiotis, Prodromos  and
      Aletras, Nikolaos  and
      Androutsopoulos, Ion",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    doi = "10.18653/v1/2020.findings-emnlp.261",
    pages = "2898--2904"
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご