LEGIT-SCRATCH-BART开源模型 - 免费处理意大利法律长文本并分析法律文档

首页

LEGIT SCRATCH BART

由 morenolq 开发

LEGIT-BART是一系列基于Transformer的预训练模型，专为意大利法律文本处理设计，支持长文本处理和法律文档分析。

大型语言模型

Transformers

其他开源协议:MIT #意大利法律文本生成 #长文档处理(16k tokens)#法律摘要生成

下载量 16

发布时间 : 2/2/2025

模型简介

LEGIT-BART模型基于BART-IT架构，在意大利法律语料库上进行了进一步预训练，适用于法律文本生成、摘要等任务。

模型特点

法律领域专用

专门针对意大利法律文本进行预训练，理解法律术语和结构

长文本处理能力

LSG注意力机制版本支持长达16,384个标记的上下文处理

多样化模型选择

提供从基础版到长文本处理版等多种变体，适应不同需求

全面法律数据训练

训练数据包含法规、判例法和合同等多种法律文件类型

模型能力

法律文本生成

法律文档摘要

法律文本补全

长法律文档处理

使用案例

法律文档处理

合同摘要生成

自动生成法律合同的简明摘要

法律条款补全

根据上下文自动补全法律条款内容

法律研究

判例法分析

处理和分析长篇法院判决文档

🚀 合法BART系列模型卡片

合法BART（LEGIT - BART）系列模型是基于预训练的Transformer架构，专门用于处理意大利法律文本的模型。它在BART - IT模型的基础上，进一步在意大利法律语料库上进行预训练，能够处理较长的法律文本，为法律领域的自然语言处理任务提供了强大的支持。

🚀 快速开始

以下是使用morenolq/LEGIT - SCRATCH - BART模型的示例代码：

from transformers import BartForConditionalGeneration, AutoTokenizer

# 加载分词器和模型
model_name = "morenolq/LEGIT-SCRATCH-BART"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

# 示例输入
input_text = "<mask> 1234: Il contratto si intende concluso quando..."
inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)

# 预训练模型填充掩码
output_ids = model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
output_text = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("📝:", output_text)

✨ 主要特性

超长文本处理能力：借助局部 - 稀疏 - 全局（LSG）注意力机制，模型能够处理长达16,384个标记的文本，满足法律文档篇幅较长的需求。
专业法律语料训练：模型在包括法规、判例法和合同等法律文档上进行训练，能够更好地理解和处理法律领域的专业语言。
灵活适配性：模型未针对特定任务进行微调，用户可以根据具体的法律自然语言处理任务（如摘要生成、问答系统等）进行进一步的适配。

📦 安装指南

文档中未提及安装步骤，如需使用该模型，可参考transformers库的官方文档进行安装。

💻 使用示例

基础用法

from transformers import BartForConditionalGeneration, AutoTokenizer

# 加载分词器和模型
model_name = "morenolq/LEGIT-SCRATCH-BART"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

# 示例输入
input_text = "<mask> 1234: Il contratto si intende concluso quando..."
inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)

# 预训练模型填充掩码
output_ids = model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
output_text = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("📝:", output_text)

📚 详细文档

可用模型

模型名称	描述	链接
LEGIT - BART	在意大利法律文本上对`morenolq/bart - it`进行持续预训练	[🔗 链接](https://huggingface.co/morenolq/LEGIT - BART)
LEGIT - BART - LSG - 4096	对`morenolq/bart - it`进行持续预训练，支持4,096个标记	[🔗 链接](https://huggingface.co/morenolq/LEGIT - BART - LSG - 4096)
LEGIT - BART - LSG - 16384	对`morenolq/bart - it`进行持续预训练，支持16,384个标记	[🔗 链接](https://huggingface.co/morenolq/LEGIT - BART - LSG - 16384)
LEGIT - SCRATCH - BART	在意大利法律文本上从头开始训练	[🔗 链接](https://huggingface.co/morenolq/LEGIT - SCRATCH - BART)
LEGIT - SCRATCH - BART - LSG - 4096	使用LSG注意力机制从头开始训练，支持4,096个标记	[🔗 链接](https://huggingface.co/morenolq/LEGIT - SCRATCH - BART - LSG - 4096)
LEGIT - SCRATCH - BART - LSG - 16384	使用LSG注意力机制从头开始训练，支持16,384个标记	[🔗 链接](https://huggingface.co/morenolq/LEGIT - SCRATCH - BART - LSG - 16384)
BART - IT - LSG - 4096	为`morenolq/bart - it`添加LSG注意力机制，支持4,096个标记（未进行法律适配）	[🔗 链接](https://huggingface.co/morenolq/BART - IT - LSG - 4096)
BART - IT - LSG - 16384	为`morenolq/bart - it`添加LSG注意力机制，支持16,384个标记（未进行法律适配）	[🔗 链接](https://huggingface.co/morenolq/BART - IT - LSG - 16384)

模型详情

架构

基础模型：[morenolq/bart - it](https://huggingface.co/morenolq/bart - it)
架构类型：Transformer编码器 - 解码器
注意力机制：采用LSG注意力机制处理长文档
分词器：从头开始训练的模型使用特定的分词器，但在实验中，持续预训练的效果更佳。

训练数据

数据集：joelniklaus/Multi_Legal_Pile
法律文本类型：
- 立法文件：包括法律、法典、修正案等
- 判例法：司法判决
- 合同：公共法律协议

🔧 技术细节

模型基于Transformer架构，通过LSG注意力机制实现对长文本的有效处理。在训练过程中，使用了意大利法律领域的多种文本数据，以提高模型对法律语言的理解能力。

📄 许可证

本模型使用MIT许可证。

⚠️ 重要提示

模型未针对特定任务进行微调，可能需要针对具体的法律自然语言处理任务（如摘要生成、问答系统等）进行进一步的适配。

法律文本可能包含法律系统中存在的偏见，使用模型时应注意确保公平性和道德性。

模型不能替代专业的法律建议，遇到法律问题时，请咨询合格的法律专业人士。

📚 参考资料

介绍LEGIT - BART模型的论文目前正在审核中，发布后将在此更新。

@article{benedetto2025legitbart,
	title        = {LegItBART: a summarization model for Italian legal documents},
	author       = {Benedetto, Irene and La Quatra, Moreno and Cagliero, Luca},
	year         = 2025,
	journal      = {Artificial Intelligence and Law},
	publisher    = {Springer},
	pages        = {1--31},
	doi          = {10.1007/s10506-025-09436-y},
	url          = {doi.org/10.1007/s10506-025-09436-y}
}