š Legal_BERTimbau
Legal_BERTimbau is a fine - tuned BERT model for the legal domain in Portuguese, based on BERTimbau Large, offering enhanced performance in legal NLP tasks.
š Quick Start
Legal_BERTimbau Large is a fine - tuned BERT model based on BERTimbau Large.
"BERTimbau Base is a pretrained BERT model for Brazilian Portuguese that achieves state - of - the - art performances on three downstream NLP tasks: Named Entity Recognition, Sentence Textual Similarity and Recognizing Textual Entailment. It is available in two sizes: Base and Large.
For further information or requests, please go to BERTimbau repository."
The performance of Language Models can change drastically when there is a domain shift between training and test data. In order to create a Portuguese Language Model adapted to a Legal domain, the original BERTimbau model was submitted to a fine - tuning stage where it was performed 1 "PreTraining" epoch over 30 000 legal Portuguese Legal documents available online. (lr: 1e - 5)
⨠Features
- Fine - tuned from BERTimbau Large for the legal domain in Portuguese.
- Available in two sizes: base and large.
- Can be used for masked language modeling prediction and getting BERT embeddings.
š¦ Installation
There is no specific installation steps provided in the original README. If you want to use the model, you can install the necessary libraries as shown in the usage examples.
š» Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("rufimelo/Legal-BERTimbau-large")
model = AutoModelForMaskedLM.from_pretrained("rufimelo/Legal-BERTimbau-large")
Advanced Usage - Masked language modeling prediction example
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("rufimelo/Legal-BERTimbau-large")
model = AutoModelForMaskedLM.from_pretrained("rufimelo/Legal-BERTimbau-large")
pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
pipe('O advogado apresentou [MASK] para o juĆz')
Advanced Usage - For BERT embeddings
import torch
from transformers import AutoModel
model = AutoModel.from_pretrained('rufimelo/Legal-BERTimbau-large')
input_ids = tokenizer.encode('O advogado apresentou recurso para o juĆz', return_tensors='pt')
with torch.no_grad():
outs = model(input_ids)
encoded = outs[0][0, 1:-1]
š Documentation
Available models
Property |
Details |
Model Type |
rufimelo/Legal-BERTimbau-base (BERT - Base, 12 layers, 110M params); rufimelo/Legal-BERTimbau-large (BERT - Large, 24 layers, 335M params) |
Training Data |
30 000 legal Portuguese documents available online |
š License
This project is licensed under the MIT license.
š Citation
If you use this work, please cite BERTimbau's work:
@inproceedings{souza2020bertimbau,
author = {F{\'a}bio Souza and
Rodrigo Nogueira and
Roberto Lotufo},
title = {{BERT}imbau: pretrained {BERT} models for {B}razilian {P}ortuguese},
booktitle = {9th Brazilian Conference on Intelligent Systems, {BRACIS}, Rio Grande do Sul, Brazil, October 20-23 (to appear)},
year = {2020}
}