đ Model Card: LEGIT-BART Series
The LEGIT-BART series consists of pre-trained transformer-based models designed for Italian legal text processing. These models leverage the BART-IT architecture and are further pre-trained on Italian legal corpora, offering extended context length and the ability to handle various legal documents.
đ Quick Start
Here's a basic example of how to use the LEGIT-BART-LSG-16384
model for text summarization:
from transformers import BartForConditionalGeneration, AutoTokenizer
model_name = "morenolq/LEGIT-BART-LSG-16384"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)
input_text = "<mask> 1234: Il contratto si intende concluso quando..."
inputs = tokenizer(input_text, return_tensors="pt", max_length=16384, truncation=True)
summary_ids = model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("đ Summary:", summary)
⨠Features
- Extended Context Length: Utilizes Local-Sparse-Global (LSG) Attention to support up to 16,384 tokens, enabling the processing of long legal documents.
- Trained on Legal Documents: The models are pre-trained on a diverse range of legal texts, including statutes, case law, and contracts.
- Flexible Adaptation: While not fine-tuned for specific tasks, the models can be further adapted to suit various legal NLP requirements.
đĻ Available Models
Model |
Description |
Link |
LEGIT-BART |
Continued pre-training of morenolq/bart-it on Italian legal texts |
đ Link |
LEGIT-BART-LSG-4096 |
Continued pre-training of morenolq/bart-it , supporting 4,096 tokens |
đ Link |
LEGIT-BART-LSG-16384 |
Continued pre-training of morenolq/bart-it , supporting 16,384 tokens |
đ Link |
LEGIT-SCRATCH-BART |
Trained from scratch on Italian legal texts |
đ Link |
LEGIT-SCRATCH-BART-LSG-4096 |
Trained from scratch with LSG attention, supporting 4,096 tokens |
đ Link |
LEGIT-SCRATCH-BART-LSG-16384 |
Trained from scratch with LSG attention, supporting 16,384 tokens |
đ Link |
BART-IT-LSG-4096 |
morenolq/bart-it with LSG attention, supporting 4,096 tokens (no legal adaptation) |
đ Link |
BART-IT-LSG-16384 |
morenolq/bart-it with LSG attention, supporting 16,384 tokens (no legal adaptation) |
đ Link |
đ§ Technical Details
Architecture
- Base Model:
morenolq/bart-it
- Transformer Encoder-Decoder: Employs a standard encoder-decoder architecture for sequence-to-sequence tasks.
- LSG Attention: Implements LSG Attention to handle long documents efficiently.
- Specific Tokenizers: Models trained from scratch use specific tokenizers, although they may underperform compared to continual pre-training.
Training Data
đ Documentation
The paper presenting the LEGIT-BART models is currently under review. Once published, the reference will be updated here:
@article{benedetto2025legitbart,
title = {LegItBART: a summarization model for Italian legal documents},
author = {Benedetto, Irene and La Quatra, Moreno and Cagliero, Luca},
year = 2025,
journal = {Artificial Intelligence and Law},
publisher = {Springer},
pages = {1--31},
doi = {10.1007/s10506-025-09436-y},
url = {doi.org/10.1007/s10506-025-09436-y}
}
đ License
This project is licensed under the MIT License.
â ī¸ Important Note
The models are not fine-tuned for specific tasks and may require further adaptation for legal NLP tasks. Additionally, legal texts may contain biases, so care should be taken to ensure fair and ethical use. The models are not a substitute for professional legal advice.