🚀 roberta-base-on-cuad 模型卡片
本模型专为法律文档的问答任务而设计,借助先进的技术架构,能有效处理法律文本,为法律专业人士和相关从业者提供准确的问答服务。
🚀 快速开始
使用以下代码开始使用该模型:
点击展开
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("Rakib/roberta-base-on-cuad")
model = AutoModelForQuestionAnswering.from_pretrained("Rakib/roberta-base-on-cuad")
✨ 主要特性
- 专为法律文档问答任务设计,能精准处理法律文本。
- 基于 RoBERTa 架构,具有强大的语言理解能力。
📚 详细文档
模型详情
模型描述
使用场景
直接使用
此模型可用于法律文档的问答任务。
训练详情
阅读论文 An Open Source Contractual Language Understanding Application Using Machine Learning ,获取有关训练过程、数据集预处理和评估的详细信息。
训练数据
更多信息请参阅 CUAD 数据集卡片。
训练过程
- 预处理:待补充更多信息。
- 速度、大小、时间:待补充更多信息。
评估
测试数据、因素和指标
结果
待补充更多信息。
模型检查
- 硬件类型:待补充更多信息。
- 使用时长:待补充更多信息。
- 云服务提供商:待补充更多信息。
- 计算区域:待补充更多信息。
- 碳排放:待补充更多信息。
技术规格
模型架构和目标
待补充更多信息。
计算基础设施
- 硬件:使用了 Google Colab Pro 的 V100/P100。
- 软件:Python、Transformers
引用
BibTeX:
@inproceedings{nawar-etal-2022-open,
title = "An Open Source Contractual Language Understanding Application Using Machine Learning",
author = "Nawar, Afra and
Rakib, Mohammed and
Hai, Salma Abdul and
Haq, Sanaulla",
booktitle = "Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference",
month = jun,
year = "2022",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://aclanthology.org/2022.lateraisse-1.6",
pages = "42--50",
abstract = "Legal field is characterized by its exclusivity and non-transparency. Despite the frequency and relevance of legal dealings, legal documents like contracts remains elusive to non-legal professionals for the copious usage of legal jargon. There has been little advancement in making legal contracts more comprehensible. This paper presents how Machine Learning and NLP can be applied to solve this problem, further considering the challenges of applying ML to the high length of contract documents and training in a low resource environment. The largest open-source contract dataset so far, the Contract Understanding Atticus Dataset (CUAD) is utilized. Various pre-processing experiments and hyperparameter tuning have been carried out and we successfully managed to eclipse SOTA results presented for models in the CUAD dataset trained on RoBERTa-base. Our model, A-type-RoBERTa-base achieved an AUPR score of 46.6{\%} compared to 42.6{\%} on the original RoBERT-base. This model is utilized in our end to end contract understanding application which is able to take a contract and highlight the clauses a user is looking to find along with it{'}s descriptions to aid due diligence before signing. Alongside digital, i.e. searchable, contracts the system is capable of processing scanned, i.e. non-searchable, contracts using tesseract OCR. This application is aimed to not only make contract review a comprehensible process to non-legal professionals, but also to help lawyers and attorneys more efficiently review contracts.",
}
模型卡片作者
Mohammed Rakib 与 Ezi Ozoani 以及 Hugging Face 团队合作完成。
模型卡片联系方式
待补充更多信息。
📄 许可证
本模型采用 MIT 许可证。