roberta-base-on-cuad开源法律模型 - 免费部署助力法律合同审查问答

首页

Roberta Base On Cuad

由 Rakib 开发

基于RoBERTa-base模型在法律合同问答任务上微调的模型，专为法律合同审查设计

问答系统

Transformers

英语开源协议:MIT #法律合同问答 #RoBERTa优化 #条款定位

下载量 14.79k

发布时间 : 3/2/2022

模型简介

该模型是基于RoBERTa-base在法律合同理解Atticus数据集(CUAD)上微调的问答系统，主要用于法律文档的问答任务，帮助非法律专业人士理解合同条款

模型特点

法律合同专用

专门针对法律合同文本进行优化，能够理解复杂的法律术语和条款

超越基准性能

在CUAD数据集上取得46.6%的AUPR分数，优于原始RoBERTa-base的42.6%

端到端应用支持

可用于构建完整的合同理解应用，包括OCR处理扫描合同的能力

模型能力

法律合同问答

条款识别与高亮

合同条款理解

法律文档分析

使用案例

法律科技

合同尽职调查

帮助非法律专业人士在签约前理解合同条款内容

自动高亮显示用户需要关注的条款及其描述

律师辅助工具

协助律师更高效地审查合同

提高合同审查效率

文档处理

扫描合同处理

使用OCR技术处理非可搜索的扫描合同

将扫描文档转换为可搜索和分析的数字格式

🚀 roberta-base-on-cuad 模型卡片

本模型专为法律文档的问答任务而设计，借助先进的技术架构，能有效处理法律文本，为法律专业人士和相关从业者提供准确的问答服务。

🚀 快速开始

使用以下代码开始使用该模型：

点击展开

from transformers import AutoTokenizer, AutoModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("Rakib/roberta-base-on-cuad")

model = AutoModelForQuestionAnswering.from_pretrained("Rakib/roberta-base-on-cuad")

✨ 主要特性

专为法律文档问答任务设计，能精准处理法律文本。
基于 RoBERTa 架构，具有强大的语言理解能力。

📚 详细文档

模型详情

模型描述

属性	详情
开发者	Mohammed Rakib
共享者	待补充更多信息
模型类型	问答系统
语言（NLP）	英语
许可证	MIT
相关模型	父模型：RoBERTa
更多信息资源	GitHub 仓库：defactolaw；关联论文：An Open Source Contractual Language Understanding Application Using Machine Learning

使用场景

直接使用

此模型可用于法律文档的问答任务。

训练详情

阅读论文 An Open Source Contractual Language Understanding Application Using Machine Learning ，获取有关训练过程、数据集预处理和评估的详细信息。

训练数据

更多信息请参阅 CUAD 数据集卡片。

训练过程

预处理：待补充更多信息。
速度、大小、时间：待补充更多信息。

评估

测试数据、因素和指标

测试数据：更多信息请参阅 CUAD 数据集卡片。
因素：待补充更多信息。
指标：待补充更多信息。

结果

待补充更多信息。

模型检查

硬件类型：待补充更多信息。
使用时长：待补充更多信息。
云服务提供商：待补充更多信息。
计算区域：待补充更多信息。
碳排放：待补充更多信息。

技术规格

模型架构和目标

待补充更多信息。

计算基础设施

硬件：使用了 Google Colab Pro 的 V100/P100。
软件：Python、Transformers

引用

BibTeX：

@inproceedings{nawar-etal-2022-open,
    title = "An Open Source Contractual Language Understanding Application Using Machine Learning",
    author = "Nawar, Afra  and
      Rakib, Mohammed  and
      Hai, Salma Abdul  and
      Haq, Sanaulla",
    booktitle = "Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.lateraisse-1.6",
    pages = "42--50",
    abstract = "Legal field is characterized by its exclusivity and non-transparency. Despite the frequency and relevance of legal dealings, legal documents like contracts remains elusive to non-legal professionals for the copious usage of legal jargon. There has been little advancement in making legal contracts more comprehensible. This paper presents how Machine Learning and NLP can be applied to solve this problem, further considering the challenges of applying ML to the high length of contract documents and training in a low resource environment. The largest open-source contract dataset so far, the Contract Understanding Atticus Dataset (CUAD) is utilized. Various pre-processing experiments and hyperparameter tuning have been carried out and we successfully managed to eclipse SOTA results presented for models in the CUAD dataset trained on RoBERTa-base. Our model, A-type-RoBERTa-base achieved an AUPR score of 46.6{\%} compared to 42.6{\%} on the original RoBERT-base. This model is utilized in our end to end contract understanding application which is able to take a contract and highlight the clauses a user is looking to find along with it{'}s descriptions to aid due diligence before signing. Alongside digital, i.e. searchable, contracts the system is capable of processing scanned, i.e. non-searchable, contracts using tesseract OCR. This application is aimed to not only make contract review a comprehensible process to non-legal professionals, but also to help lawyers and attorneys more efficiently review contracts.",
}