LEGIT-SCRATCH-BART Open-Source Model - Free Processing of Long Italian Legal Texts and Analysis of Legal Documents

LEGIT SCRATCH BART

Developed by morenolq

LEGIT-BART is a series of Transformer-based pre-trained models specifically designed for Italian legal text processing, supporting long-text handling and legal document analysis.

Large Language Model

Transformers

OtherOpen Source License:MIT #Italian legal text generation #Long document processing (16k tokens)#Legal summarization

Downloads 16

Release Time : 2/2/2025

Model Overview

The LEGIT-BART model is based on the BART-IT architecture and has undergone further pre-training on Italian legal corpora, making it suitable for tasks such as legal text generation and summarization.

Model Features

Specialized for legal domain

Pre-trained specifically on Italian legal texts, understanding legal terminology and structure

Long-text processing capability

LSG attention mechanism version supports context processing up to 16,384 tokens

Diverse model selection

Offers multiple variants from basic to long-text processing versions to meet different needs

Comprehensive legal data training

Training data includes various types of legal documents such as regulations, case law, and contracts

Model Capabilities

Legal text generation

Legal document summarization

Legal text completion

Long legal document processing

Use Cases

Legal document processing

Contract summarization

Automatically generate concise summaries of legal contracts

Legal clause completion

Automatically complete legal clause content based on context

Legal research

Case law analysis

Process and analyze lengthy court judgment documents

🚀 Model Card: LEGIT-BART Series

The LEGIT-BART models are a family of pre - trained transformer - based models designed for Italian legal text processing. They are built on BART - IT (morenolq/bart-it) and further pre - trained on Italian legal corpora.

✨ Features

Extended context length: With Local - Sparse - Global (LSG) Attention, it can handle up to 16,384 tokens 📜.
Trained on legal documents: Such as statutes, case law, and contracts 📑.
Not fine - tuned for specific tasks: These models require further adaptation for specific applications.

📦 Available Models

Property	Details
Model Type	LEGIT - BART, LEGIT - BART - LSG - 4096, LEGIT - BART - LSG - 16384, LEGIT - SCRATCH - BART, LEGIT - SCRATCH - BART - LSG - 4096, LEGIT - SCRATCH - BART - LSG - 16384, BART - IT - LSG - 4096, BART - IT - LSG - 16384
Description	Continued pre - training of `morenolq/bart - it` on Italian legal texts; Continued pre - training of `morenolq/bart - it`, supporting 4,096 tokens; Continued pre - training of `morenolq/bart - it`, supporting 16,384 tokens; Trained from scratch on Italian legal texts; Trained from scratch with LSG attention, supporting 4,096 tokens; Trained from scratch with LSG attention, supporting 16,384 tokens; `morenolq/bart - it` with LSG attention, supporting 4,096 tokens (no legal adaptation); `morenolq/bart - it` with LSG attention, supporting 16,384 tokens (no legal adaptation)
Link	[🔗 Link](https://huggingface.co/morenolq/LEGIT - BART); [🔗 Link](https://huggingface.co/morenolq/LEGIT - BART - LSG - 4096); [🔗 Link](https://huggingface.co/morenolq/LEGIT - BART - LSG - 16384); [🔗 Link](https://huggingface.co/morenolq/LEGIT - SCRATCH - BART); [🔗 Link](https://huggingface.co/morenolq/LEGIT - SCRATCH - BART - LSG - 4096); [🔗 Link](https://huggingface.co/morenolq/LEGIT - SCRATCH - BART - LSG - 16384); [🔗 Link](https://huggingface.co/morenolq/BART - IT - LSG - 4096); [🔗 Link](https://huggingface.co/morenolq/BART - IT - LSG - 16384)

🔧 Technical Details

Architecture:
- Base Model: [morenolq/bart - it](https://huggingface.co/morenolq/bart - it).
- Transformer Encoder - Decoder.
- LSG Attention for long documents.
- Specific tokenizers for models trained from scratch (underperforming continual pre - training in experiments).
Training Data:
- Dataset: joelniklaus/Multi_Legal_Pile.
- Types of legal texts: Legislation (laws, codes, amendments), case law (judicial decisions), contracts (public legal agreements).

💻 Usage Examples

Basic Usage

from transformers import BartForConditionalGeneration, AutoTokenizer

# Load tokenizer and model
model_name = "morenolq/LEGIT-SCRATCH-BART"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

# Example input
input_text = "<mask> 1234: Il contratto si intende concluso quando..."
inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)

# Pre-trained model fill the mask
output_ids = model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
output_text = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("📝:", output_text)

⚠️ Important Note

Not fine - tuned for specific tasks: The models are pre - trained on legal texts and may require further adaptation for specific legal NLP tasks (e.g., summarization, question - answering).
Bias and fairness: Legal texts may contain biases present in the legal system. Care should be taken to ensure fairness and ethical use of the models.
Legal advice: The models are not a substitute for professional legal advice. Always consult a qualified legal professional for legal matters.

📚 Documentation

The paper presenting LEGIT - BART models is currently under review and will be updated here once published.

@article{benedetto2025legitbart,
	title        = {LegItBART: a summarization model for Italian legal documents},
	author       = {Benedetto, Irene and La Quatra, Moreno and Cagliero, Luca},
	year         = 2025,
	journal      = {Artificial Intelligence and Law},
	publisher    = {Springer},
	pages        = {1--31},
	doi          = {10.1007/s10506-025-09436-y},
	url          = {doi.org/10.1007/s10506-025-09436-y}
}

📄 License

This model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご