đ Model Card: LEGIT-BART Series
The LEGIT-BART models are a family of pre - trained transformer - based models designed for Italian legal text processing. They are built on BART - IT (morenolq/bart-it
) and further pre - trained on Italian legal corpora.
⨠Features
- Extended context length: With Local - Sparse - Global (LSG) Attention, it can handle up to 16,384 tokens đ.
- Trained on legal documents: Such as statutes, case law, and contracts đ.
- Not fine - tuned for specific tasks: These models require further adaptation for specific applications.
đĻ Available Models
Property |
Details |
Model Type |
LEGIT - BART, LEGIT - BART - LSG - 4096, LEGIT - BART - LSG - 16384, LEGIT - SCRATCH - BART, LEGIT - SCRATCH - BART - LSG - 4096, LEGIT - SCRATCH - BART - LSG - 16384, BART - IT - LSG - 4096, BART - IT - LSG - 16384 |
Description |
Continued pre - training of morenolq/bart - it on Italian legal texts; Continued pre - training of morenolq/bart - it , supporting 4,096 tokens; Continued pre - training of morenolq/bart - it , supporting 16,384 tokens; Trained from scratch on Italian legal texts; Trained from scratch with LSG attention, supporting 4,096 tokens; Trained from scratch with LSG attention, supporting 16,384 tokens; morenolq/bart - it with LSG attention, supporting 4,096 tokens (no legal adaptation); morenolq/bart - it with LSG attention, supporting 16,384 tokens (no legal adaptation) |
Link |
[đ Link](https://huggingface.co/morenolq/LEGIT - BART); [đ Link](https://huggingface.co/morenolq/LEGIT - BART - LSG - 4096); [đ Link](https://huggingface.co/morenolq/LEGIT - BART - LSG - 16384); [đ Link](https://huggingface.co/morenolq/LEGIT - SCRATCH - BART); [đ Link](https://huggingface.co/morenolq/LEGIT - SCRATCH - BART - LSG - 4096); [đ Link](https://huggingface.co/morenolq/LEGIT - SCRATCH - BART - LSG - 16384); [đ Link](https://huggingface.co/morenolq/BART - IT - LSG - 4096); [đ Link](https://huggingface.co/morenolq/BART - IT - LSG - 16384) |
đ§ Technical Details
- Architecture:
- Base Model: [
morenolq/bart - it
](https://huggingface.co/morenolq/bart - it).
- Transformer Encoder - Decoder.
- LSG Attention for long documents.
- Specific tokenizers for models trained from scratch (underperforming continual pre - training in experiments).
- Training Data:
- Dataset:
joelniklaus/Multi_Legal_Pile
.
- Types of legal texts: Legislation (laws, codes, amendments), case law (judicial decisions), contracts (public legal agreements).
đģ Usage Examples
Basic Usage
from transformers import BartForConditionalGeneration, AutoTokenizer
model_name = "morenolq/LEGIT-SCRATCH-BART"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)
input_text = "<mask> 1234: Il contratto si intende concluso quando..."
inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)
output_ids = model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
output_text = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("đ:", output_text)
â ī¸ Important Note
- Not fine - tuned for specific tasks: The models are pre - trained on legal texts and may require further adaptation for specific legal NLP tasks (e.g., summarization, question - answering).
- Bias and fairness: Legal texts may contain biases present in the legal system. Care should be taken to ensure fairness and ethical use of the models.
- Legal advice: The models are not a substitute for professional legal advice. Always consult a qualified legal professional for legal matters.
đ Documentation
The paper presenting LEGIT - BART models is currently under review and will be updated here once published.
@article{benedetto2025legitbart,
title = {LegItBART: a summarization model for Italian legal documents},
author = {Benedetto, Irene and La Quatra, Moreno and Cagliero, Luca},
year = 2025,
journal = {Artificial Intelligence and Law},
publisher = {Springer},
pages = {1--31},
doi = {10.1007/s10506-025-09436-y},
url = {doi.org/10.1007/s10506-025-09436-y}
}
đ License
This model is released under the MIT license.