š Legal_BERTimbau
Legal_BERTimbau Large is a fine - tuned BERT model tailored for the legal domain. It's based on BERTimbau Large, a pre - trained BERT model for Brazilian Portuguese. This model offers state - of - the - art performance on multiple NLP tasks.
š Quick Start
Legal_BERTimbau is a fine - tuned BERT model based on BERTimbau Large. The performance of language models can vary significantly due to domain shifts. To adapt it to the legal domain, the original BERTimbau model was fine - tuned with one "PreTraining" epoch on 30,000 legal Portuguese documents available online.
⨠Features
- Fine - tuned for Legal Domain: Adapted to the legal domain through fine - tuning on legal Portuguese documents.
- Based on BERTimbau: Leverages the pre - trained BERTimbau Large model.
- Multiple Model Sizes: Available in Base and Large sizes.
š¦ Installation
No specific installation steps are provided in the original document. However, you can use the following Python code to load the model and tokenizer:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("rufimelo/Legal-BERTimbau-base")
model = AutoModelForMaskedLM.from_pretrained("rufimelo/Legal-BERTimbau-base")
š» Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("rufimelo/Legal-BERTimbau-base")
model = AutoModelForMaskedLM.from_pretrained("rufimelo/Legal-BERTimbau-base")
Advanced Usage - Masked language modeling prediction example
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("rufimelo/Legal-BERTimbau-base")
model = AutoModelForMaskedLM.from_pretrained("rufimelo/Legal-BERTimbau-base")
pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
pipe('O advogado apresentou [MASK] para o juĆz')
Advanced Usage - For BERT embeddings
import torch
from transformers import AutoModel
model = AutoModel.from_pretrained('rufimelo/Legal-BERTimbau-base')
input_ids = tokenizer.encode('O advogado apresentou recurso para o juĆz', return_tensors='pt')
with torch.no_grad():
outs = model(input_ids)
encoded = outs[0][0, 1:-1]
š Documentation
Available models
Property |
Details |
Model Type |
rufimelo/Legal-BERTimbau-base (BERT - Base, 12 layers, 110M params); rufimelo/Legal-BERTimbau-large (BERT - Large, 24 layers, 335M params) |
Training Data |
30,000 legal Portuguese documents available online |
š License
This project is licensed under the MIT license.
š Citation
If you use this work, please cite BERTimbau's work:
@inproceedings{souza2020bertimbau,
author = {F{\'a}bio Souza and
Rodrigo Nogueira and
Roberto Lotufo},
title = {{BERT}imbau: pretrained {BERT} models for {B}razilian {P}ortuguese},
booktitle = {9th Brazilian Conference on Intelligent Systems, {BRACIS}, Rio Grande do Sul, Brazil, October 20 - 23 (to appear)},
year = {2020}
}