đ Model Card: LEGIT-BART Series
The LEGIT-BART models are a collection of pre-trained transformer-based models designed for Italian legal text processing. They are built on the foundation of BART-IT (morenolq/bart-it
) and further pre-trained on Italian legal corpora, offering unique features for handling legal documents.
⨠Features
- Extended Context Length: Utilize Local-Sparse-Global (LSG) Attention to handle up to 16,384 tokens, enabling processing of long legal texts.
- Legal Document Training: Trained on a diverse range of legal documents, including statutes, case law, and contracts.
- Task Adaptability: These models are not fine-tuned for specific tasks, allowing for further adaptation to various legal NLP tasks.
đĻ Available Models
Model |
Description |
Link |
LEGIT-BART |
Continued pre-training of morenolq/bart-it on Italian legal texts. |
đ Link |
LEGIT-BART-LSG-4096 |
Continued pre-training of morenolq/bart-it , supporting 4,096 tokens. |
đ Link |
LEGIT-BART-LSG-16384 |
Continued pre-training of morenolq/bart-it , supporting 16,384 tokens. |
đ Link |
LEGIT-SCRATCH-BART |
Trained from scratch on Italian legal texts. |
đ Link |
LEGIT-SCRATCH-BART-LSG-4096 |
Trained from scratch with LSG attention, supporting 4,096 tokens. |
đ Link |
LEGIT-SCRATCH-BART-LSG-16384 |
Trained from scratch with LSG attention, supporting 16,384 tokens. |
đ Link |
BART-IT-LSG-4096 |
morenolq/bart-it with LSG attention, supporting 4,096 tokens (no legal adaptation). |
đ Link |
BART-IT-LSG-16384 |
morenolq/bart-it with LSG attention, supporting 16,384 tokens (no legal adaptation). |
đ Link |
đ§ Technical Details
Architecture
- Base Model:
morenolq/bart-it
- Transformer Encoder-Decoder: Leveraging the power of transformers for effective text processing.
- LSG Attention: Enables handling of long legal documents.
- Specific Tokenizers: Used for models trained from scratch, though continual pre-training models performed better in experiments.
Training Data
- Dataset:
joelniklaus/Multi_Legal_Pile
- Legal Text Types:
- Legislation: Laws, codes, and amendments.
- Case Law: Judicial decisions.
- Contracts: Public legal agreements.
đģ Usage Examples
Basic Usage
from transformers import BartForConditionalGeneration, AutoTokenizer
model_name = "morenolq/LEGIT-BART-LSG-4096"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)
input_text = "<mask> 1234: Il contratto si intende concluso quando..."
inputs = tokenizer(input_text, return_tensors="pt", max_length=4096, truncation=True)
summary_ids = model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("đ Summary:", summary)
đ License
This project is licensed under the MIT license.
â ī¸ Important Note
- Task Adaptation: The models are pre-trained on legal texts and may require further adaptation for specific legal NLP tasks such as summarization or question-answering.
- Bias and Fairness: Legal texts may contain biases present in the legal system. Ensure fairness and ethical use of the models.
- Legal Advice: These models are not a substitute for professional legal advice. Consult a qualified legal professional for legal matters.
đ Documentation
The paper presenting LEGIT-BART models is currently under review and will be updated here once published.
@article{benedetto2025legitbart,
title = {LegItBART: a summarization model for Italian legal documents},
author = {Benedetto, Irene and La Quatra, Moreno and Cagliero, Luca},
year = 2025,
journal = {Artificial Intelligence and Law},
publisher = {Springer},
pages = {1--31},
doi = {10.1007/s10506-025-09436-y},
url = {doi.org/10.1007/s10506-025-09436-y}
}