Legal-BERTimbau-base Open-Source Model - Adapted for Portuguese Legal Text Processing, Free to Deploy and Easy to Use!

Legal BERTimbau Base

Developed by rufimelo

A legal domain-specific BERT model fine-tuned based on the BERTimbau large model, adapted for Portuguese legal text processing

Large Language Model

Transformers

OtherOpen Source License:MIT #Portuguese legal text processing #Legal document prediction #Fine-tuned BERT model

Downloads 1,238

Release Time : 7/29/2022

Model Overview

This model is a BERT model optimized for the Portuguese legal domain, fine-tuned on 30,000 legal documents to enhance performance in legal text processing tasks

Model Features

Legal domain optimization

Specifically fine-tuned for Portuguese legal texts to improve understanding of legal terminology

Dual version option

Provides both base version (110M parameters) and large version (335M parameters)

High-quality pre-training

Fine-tuned based on the BERTimbau model (best-performing BERT for Brazilian Portuguese)

Model Capabilities

Legal text understanding

Legal term prediction

Legal document embedding representation

Use Cases

Legal document processing

Legal document completion

Automatically predict missing professional terms in legal documents

Can accurately predict legal professional terms such as 'appeal petition' in examples

Legal text analysis

Extract semantic feature representations of legal documents

Can generate BERT embedding vectors for legal texts

🚀 Legal_BERTimbau

Legal_BERTimbau Large is a fine - tuned BERT model tailored for the legal domain. It's based on BERTimbau Large, a pre - trained BERT model for Brazilian Portuguese. This model offers state - of - the - art performance on multiple NLP tasks.

🚀 Quick Start

Legal_BERTimbau is a fine - tuned BERT model based on BERTimbau Large. The performance of language models can vary significantly due to domain shifts. To adapt it to the legal domain, the original BERTimbau model was fine - tuned with one "PreTraining" epoch on 30,000 legal Portuguese documents available online.

✨ Features

Fine - tuned for Legal Domain: Adapted to the legal domain through fine - tuning on legal Portuguese documents.
Based on BERTimbau: Leverages the pre - trained BERTimbau Large model.
Multiple Model Sizes: Available in Base and Large sizes.

📦 Installation

No specific installation steps are provided in the original document. However, you can use the following Python code to load the model and tokenizer:

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("rufimelo/Legal-BERTimbau-base")

model = AutoModelForMaskedLM.from_pretrained("rufimelo/Legal-BERTimbau-base")

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("rufimelo/Legal-BERTimbau-base")

model = AutoModelForMaskedLM.from_pretrained("rufimelo/Legal-BERTimbau-base")

Advanced Usage - Masked language modeling prediction example

from  transformers  import  pipeline
from  transformers  import  AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("rufimelo/Legal-BERTimbau-base")
model = AutoModelForMaskedLM.from_pretrained("rufimelo/Legal-BERTimbau-base")

pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
pipe('O advogado apresentou [MASK] para o juíz')
# [{'score': 0.5034703612327576, 
#'token': 8190, 
#'token_str': 'recurso', 
#'sequence': 'O advogado apresentou recurso para o juíz'}, 
#{'score': 0.07347951829433441, 
#'token': 21973, 
#'token_str': 'petição', 
#'sequence': 'O advogado apresentou petição para o juíz'}, 
#{'score': 0.05165359005331993, 
#'token': 4299, 
#'token_str': 'resposta', 
#'sequence': 'O advogado apresentou resposta para o juíz'}, 
#{'score': 0.04611917585134506,
#'token': 5265, 
#'token_str': 'exposição', 
#'sequence': 'O advogado apresentou exposição para o juíz'}, 
#{'score': 0.04068068787455559, 
#'token': 19737, 'token_str': 
#'alegações', 
#'sequence': 'O advogado apresentou alegações para o juíz'}]

Advanced Usage - For BERT embeddings

import  torch
from  transformers  import  AutoModel

model = AutoModel.from_pretrained('rufimelo/Legal-BERTimbau-base')
input_ids = tokenizer.encode('O advogado apresentou recurso para o juíz', return_tensors='pt')

with  torch.no_grad():
    outs = model(input_ids)
    encoded = outs[0][0, 1:-1]
    
#tensor([[ 0.0328, -0.4292, -0.6230, ..., -0.3048, -0.5674, 0.0157], 
#[-0.3569, 0.3326, 0.7013, ..., -0.7778, 0.2646, 1.1310], 
#[ 0.3169, 0.4333, 0.2026, ..., 1.0517, -0.1951, 0.7050], 
#..., 
#[-0.3648, -0.8137, -0.4764, ..., -0.2725, -0.4879, 0.6264], 
#[-0.2264, -0.1821, -0.3011, ..., -0.5428, 0.1429, 0.0509], 
#[-1.4617, 0.6281, -0.0625, ..., -1.2774, -0.4491, 0.3131]])

📚 Documentation

Available models

Property	Details
Model Type	`rufimelo/Legal-BERTimbau-base` (BERT - Base, 12 layers, 110M params); `rufimelo/Legal-BERTimbau-large` (BERT - Large, 24 layers, 335M params)
Training Data	30,000 legal Portuguese documents available online

📄 License

This project is licensed under the MIT license.

📚 Citation

If you use this work, please cite BERTimbau's work:

@inproceedings{souza2020bertimbau,
  author    = {F{\'a}bio Souza and
               Rodrigo Nogueira and
               Roberto Lotufo},
  title     = {{BERT}imbau: pretrained {BERT} models for {B}razilian {P}ortuguese},
  booktitle = {9th Brazilian Conference on Intelligent Systems, {BRACIS}, Rio Grande do Sul, Brazil, October 20 - 23 (to appear)},
  year      = {2020}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご