LEGIT-BART-LSG-4096 Open-Source Model - Free Processing of Italian Long Legal Documents and Generation of Legal Texts

LEGIT BART LSG 4096

Developed by morenolq

LEGIT-BART is a series of Transformer-based pre-trained models specifically designed for Italian legal text processing, supporting long document handling and legal text generation tasks.

Large Language Model

Transformers

OtherOpen Source License:MIT #Italian legal text processing #Long document summarization #LSG attention mechanism

Downloads 13

Release Time : 2/2/2025

Model Overview

The LEGIT-BART model is based on the BART-IT architecture and has undergone further pre-training on Italian legal corpora, making it suitable for tasks such as legal text generation and summarization.

Model Features

Legal domain pre-training

Specialized pre-training on Italian legal texts (regulations, case law, contracts, etc.) to adapt to the characteristics of legal language.

Long document processing capability

Utilizes the LSG attention mechanism, with some models supporting up to 16,384 tokens for long-context processing.

Multiple version options

Offers different versions ranging from basic pre-training to full training from scratch to meet various application needs.

Model Capabilities

Italian legal text generation

Legal document summarization

Long document processing

Use Cases

Legal document processing

Contract summarization

Extracting key clauses and points from lengthy legal contracts

Case law analysis

Processing judicial decision texts to extract key case information

🚀 Model Card: LEGIT-BART Series

The LEGIT-BART models are a collection of pre-trained transformer-based models designed for Italian legal text processing. They are built on the foundation of BART-IT (morenolq/bart-it) and further pre-trained on Italian legal corpora, offering unique features for handling legal documents.

✨ Features

Extended Context Length: Utilize Local-Sparse-Global (LSG) Attention to handle up to 16,384 tokens, enabling processing of long legal texts.
Legal Document Training: Trained on a diverse range of legal documents, including statutes, case law, and contracts.
Task Adaptability: These models are not fine-tuned for specific tasks, allowing for further adaptation to various legal NLP tasks.

📦 Available Models

Model	Description	Link
LEGIT-BART	Continued pre-training of `morenolq/bart-it` on Italian legal texts.	🔗 Link
LEGIT-BART-LSG-4096	Continued pre-training of `morenolq/bart-it`, supporting 4,096 tokens.	🔗 Link
LEGIT-BART-LSG-16384	Continued pre-training of `morenolq/bart-it`, supporting 16,384 tokens.	🔗 Link
LEGIT-SCRATCH-BART	Trained from scratch on Italian legal texts.	🔗 Link
LEGIT-SCRATCH-BART-LSG-4096	Trained from scratch with LSG attention, supporting 4,096 tokens.	🔗 Link
LEGIT-SCRATCH-BART-LSG-16384	Trained from scratch with LSG attention, supporting 16,384 tokens.	🔗 Link
BART-IT-LSG-4096	`morenolq/bart-it` with LSG attention, supporting 4,096 tokens (no legal adaptation).	🔗 Link
BART-IT-LSG-16384	`morenolq/bart-it` with LSG attention, supporting 16,384 tokens (no legal adaptation).	🔗 Link

🔧 Technical Details

Architecture

Base Model: morenolq/bart-it
Transformer Encoder-Decoder: Leveraging the power of transformers for effective text processing.
LSG Attention: Enables handling of long legal documents.
Specific Tokenizers: Used for models trained from scratch, though continual pre-training models performed better in experiments.

Training Data

Dataset: joelniklaus/Multi_Legal_Pile
Legal Text Types:
- Legislation: Laws, codes, and amendments.
- Case Law: Judicial decisions.
- Contracts: Public legal agreements.

💻 Usage Examples

Basic Usage

from transformers import BartForConditionalGeneration, AutoTokenizer

# Load tokenizer and model
model_name = "morenolq/LEGIT-BART-LSG-4096"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

# Example input
input_text = "<mask> 1234: Il contratto si intende concluso quando..."
inputs = tokenizer(input_text, return_tensors="pt", max_length=4096, truncation=True)

# Generate summary
summary_ids = model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("📝 Summary:", summary)

📄 License

This project is licensed under the MIT license.

⚠️ Important Note

Task Adaptation: The models are pre-trained on legal texts and may require further adaptation for specific legal NLP tasks such as summarization or question-answering.
Bias and Fairness: Legal texts may contain biases present in the legal system. Ensure fairness and ethical use of the models.
Legal Advice: These models are not a substitute for professional legal advice. Consult a qualified legal professional for legal matters.

📚 Documentation

The paper presenting LEGIT-BART models is currently under review and will be updated here once published.

@article{benedetto2025legitbart,
	title        = {LegItBART: a summarization model for Italian legal documents},
	author       = {Benedetto, Irene and La Quatra, Moreno and Cagliero, Luca},
	year         = 2025,
	journal      = {Artificial Intelligence and Law},
	publisher    = {Springer},
	pages        = {1--31},
	doi          = {10.1007/s10506-025-09436-y},
	url          = {doi.org/10.1007/s10506-025-09436-y}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご