Legal-BERTimbau-large Open-source Legal AI Model - Free Processing of Portuguese Legal Texts

Legal BERTimbau Large

Developed by rufimelo

A Portuguese BERT model fine-tuned for the legal domain based on BERTimbau large model, specialized in legal text processing

Large Language Model

Transformers

OtherOpen Source License:MIT #Portuguese legal texts #Legal term prediction #Fine-tuned BERT model

Downloads 194

Release Time : 7/24/2022

Model Overview

This model is a BERT model optimized for Portuguese legal texts, fine-tuned on 30,000 legal documents, suitable for natural language processing tasks in the legal field

Model Features

Legal domain optimization

Fine-tuned on 30,000 Portuguese legal documents, specifically adapted to legal text characteristics

Large model architecture

Adopts BERT-large architecture with 24 layers and 335 million parameters

Portuguese language support

Specifically optimized for Brazilian Portuguese, with excellent performance in the legal domain

Model Capabilities

Legal text understanding

Legal term prediction

Legal document embedding

Legal text masked prediction

Use Cases

Legal document processing

Legal document completion

Automatically predict missing professional terms in legal documents

Successfully predicted 'appeal' as the most likely legal term in the example

Legal document analysis

Generate semantic embedding representations of legal documents

Can be used for similar case retrieval or classification tasks

🚀 Legal_BERTimbau

Legal_BERTimbau is a fine - tuned BERT model for the legal domain in Portuguese, based on BERTimbau Large, offering enhanced performance in legal NLP tasks.

🚀 Quick Start

Legal_BERTimbau Large is a fine - tuned BERT model based on BERTimbau Large.

"BERTimbau Base is a pretrained BERT model for Brazilian Portuguese that achieves state - of - the - art performances on three downstream NLP tasks: Named Entity Recognition, Sentence Textual Similarity and Recognizing Textual Entailment. It is available in two sizes: Base and Large.

For further information or requests, please go to BERTimbau repository."

The performance of Language Models can change drastically when there is a domain shift between training and test data. In order to create a Portuguese Language Model adapted to a Legal domain, the original BERTimbau model was submitted to a fine - tuning stage where it was performed 1 "PreTraining" epoch over 30 000 legal Portuguese Legal documents available online. (lr: 1e - 5)

✨ Features

Fine - tuned from BERTimbau Large for the legal domain in Portuguese.
Available in two sizes: base and large.
Can be used for masked language modeling prediction and getting BERT embeddings.

📦 Installation

There is no specific installation steps provided in the original README. If you want to use the model, you can install the necessary libraries as shown in the usage examples.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("rufimelo/Legal-BERTimbau-large")

model = AutoModelForMaskedLM.from_pretrained("rufimelo/Legal-BERTimbau-large")

Advanced Usage - Masked language modeling prediction example

from  transformers  import  pipeline
from  transformers  import  AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("rufimelo/Legal-BERTimbau-large")
model = AutoModelForMaskedLM.from_pretrained("rufimelo/Legal-BERTimbau-large")

pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
pipe('O advogado apresentou [MASK] para o juíz')
# [{'score': 0.5034703612327576, 
#'token': 8190, 
#'token_str': 'recurso', 
#'sequence': 'O advogado apresentou recurso para o juíz'}, 
#{'score': 0.07347951829433441, 
#'token': 21973, 
#'token_str': 'petição', 
#'sequence': 'O advogado apresentou petição para o juíz'}, 
#{'score': 0.05165359005331993, 
#'token': 4299, 
#'token_str': 'resposta', 
#'sequence': 'O advogado apresentou resposta para o juíz'}, 
#{'score': 0.04611917585134506,
#'token': 5265, 
#'token_str': 'exposição', 
#'sequence': 'O advogado apresentou exposição para o juíz'}, 
#{'score': 0.04068068787455559, 
#'token': 19737, 'token_str': 
#'alegações', 
#'sequence': 'O advogado apresentou alegações para o juíz'}]

Advanced Usage - For BERT embeddings

import  torch
from  transformers  import  AutoModel

model = AutoModel.from_pretrained('rufimelo/Legal-BERTimbau-large')
input_ids = tokenizer.encode('O advogado apresentou recurso para o juíz', return_tensors='pt')

with  torch.no_grad():
	outs = model(input_ids)
	encoded = outs[0][0, 1:-1]
	
#tensor([[ 0.0328, -0.4292, -0.6230, ..., -0.3048, -0.5674, 0.0157], 
#[-0.3569, 0.3326, 0.7013, ..., -0.7778, 0.2646, 1.1310], 
#[ 0.3169, 0.4333, 0.2026, ..., 1.0517, -0.1951, 0.7050], 
#..., 
#[-0.3648, -0.8137, -0.4764, ..., -0.2725, -0.4879, 0.6264], 
#[-0.2264, -0.1821, -0.3011, ..., -0.5428, 0.1429, 0.0509], 
#[-1.4617, 0.6281, -0.0625, ..., -1.2774, -0.4491, 0.3131]])

📚 Documentation

Available models

Property	Details
Model Type	`rufimelo/Legal-BERTimbau-base` (BERT - Base, 12 layers, 110M params); `rufimelo/Legal-BERTimbau-large` (BERT - Large, 24 layers, 335M params)
Training Data	30 000 legal Portuguese documents available online

📄 License

This project is licensed under the MIT license.

📚 Citation

If you use this work, please cite BERTimbau's work:

@inproceedings{souza2020bertimbau,
  author    = {F{\'a}bio Souza and
               Rodrigo Nogueira and
               Roberto Lotufo},
  title     = {{BERT}imbau: pretrained {BERT} models for {B}razilian {P}ortuguese},
  booktitle = {9th Brazilian Conference on Intelligent Systems, {BRACIS}, Rio Grande do Sul, Brazil, October 20-23 (to appear)},
  year      = {2020}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご