LEGIT-BART: An Open-Source Italian Legal Text Processing Model

LEGIT BART

Developed by morenolq

LEGIT-BART is a series of Italian legal text processing models based on BART-IT pre-training, supporting long text processing and legal domain tasks.

Large Language Model

Transformers

OtherOpen Source License:MIT #Italian legal texts #Long document processing #Legal summarization generation

Downloads 16

Release Time : 2/2/2025

Model Overview

This series of models is based on the BART-IT architecture, specifically optimized for Italian legal texts, supporting tasks such as text generation and summarization, and is particularly suitable for processing long legal documents.

Model Features

Legal domain optimization

Specially pre-trained for Italian legal texts, including regulations, case law, and contracts.

Long text processing capability

Some models adopt the LSG attention mechanism, supporting long document processing of up to 16,384 tokens.

Multiple version options

Provides various model variants from basic to long-text optimized versions to meet different scenario needs.

Model Capabilities

Italian legal text generation

Legal document summarization

Long text processing

Mask filling

Use Cases

Legal document processing

Legal clause generation

Automatically generates complete clause content based on partial legal clauses

Judgment summarization

Automatically generates key summaries of court judgments

Contract processing

Contract clause completion

Automatically completes missing clauses based on partial contract content

🚀 Model Card: LEGIT-BART Series

The LEGIT-BART models are a family of pre-trained transformer-based models designed for Italian legal text processing. They are built upon BART-IT and further pre-trained on Italian legal corpora, offering unique features for handling legal documents.

✨ Features

Extended context length: With Local-Sparse-Global (LSG) Attention, it can handle up to 16,384 tokens, enabling more comprehensive analysis of legal texts.
Trained on legal documents: The models are trained on statutes, case law, and contracts, making them well-suited for legal text processing.
Not fine-tuned for specific tasks: They require further adaptation for specific legal NLP tasks.

📦 Available Models

Model	Description	Link
LEGIT-BART	Continued pre-training of `morenolq/bart-it` on Italian legal texts	🔗 Link
LEGIT-BART-LSG-4096	Continued pre-training of `morenolq/bart-it`, supporting 4,096 tokens	🔗 Link
LEGIT-BART-LSG-16384	Continued pre-training of `morenolq/bart-it`, supporting 16,384 tokens	🔗 Link
LEGIT-SCRATCH-BART	Trained from scratch on Italian legal texts	🔗 Link
LEGIT-SCRATCH-BART-LSG-4096	Trained from scratch with LSG attention, supporting 4,096 tokens	🔗 Link
LEGIT-SCRATCH-BART-LSG-16384	Trained from scratch with LSG attention, supporting 16,384 tokens	🔗 Link
BART-IT-LSG-4096	`morenolq/bart-it` with LSG attention, supporting 4,096 tokens (no legal adaptation)	🔗 Link
BART-IT-LSG-16384	`morenolq/bart-it` with LSG attention, supporting 16,384 tokens (no legal adaptation)	🔗 Link

🔧 Technical Details

Architecture

Base Model: morenolq/bart-it
Transformer Encoder-Decoder: Utilizes a standard encoder-decoder architecture for text processing.
LSG Attention: Enables handling of long documents effectively.
Specific tokenizers: Models trained from scratch use specific tokenizers, but they underperform in continual pre-training compared to other models.

Training Data

Dataset: joelniklaus/Multi_Legal_Pile
Types of legal texts: The models are trained on legislation, case law, and contracts, providing a comprehensive understanding of legal language.

💻 Usage Examples

Basic Usage

from transformers import BartForConditionalGeneration, AutoTokenizer

# Load tokenizer and model
model_name = "morenolq/LEGIT-BART"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

# Example input
input_text = "<mask> 1234: Il contratto si intende concluso quando..."
inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)

# Pre-trained model fill the mask
output_ids = model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
output_text = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("📝:", output_text)

⚠️ Important Note

Not fine-tuned for specific tasks: The models are pre-trained on legal texts and may require further adaptation for specific legal NLP tasks (e.g., summarization, question-answering).
Bias and fairness: Legal texts may contain biases present in the legal system. Care should be taken to ensure fairness and ethical use of the models.
Legal advice: The models are not a substitute for professional legal advice. Always consult a qualified legal professional for legal matters.

📚 Documentation

The paper presenting LEGIT-BART models is currently under review and will be updated here once published.

@article{benedetto2025legitbart,
	title        = {LegItBART: a summarization model for Italian legal documents},
	author       = {Benedetto, Irene and La Quatra, Moreno and Cagliero, Luca},
	year         = 2025,
	journal      = {Artificial Intelligence and Law},
	publisher    = {Springer},
	pages        = {1--31},
	doi          = {10.1007/s10506-025-09436-y},
	url          = {doi.org/10.1007/s10506-025-09436-y}
}

📄 License

This model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご