🚀 FinBERT-PT-BR : Financial BERT PT BR
FinBERT-PT-BR is a pre-trained NLP model designed to analyze the sentiment of Brazilian Portuguese financial texts. It offers a reliable solution for understanding the sentiment within financial news and related content, which is invaluable for market analysis and decision - making.
The model underwent two main training stages: language modeling and sentiment modeling. In the initial language modeling stage, it was trained on over 1.4 million Portuguese financial news texts. Leveraging the outcomes of this training, a sentiment classifier was constructed using only a small number (500) of labeled texts, achieving satisfactory convergence.
At the conclusion of the project, a comparative analysis with other models was conducted, along with an exploration of the possible applications of the developed model. The comparative analysis revealed that the developed model outperformed current state - of - the - art models. Among its applications, it can be used to build sentiment indices, formulate investment strategies, and conduct macroeconomic data analysis, such as inflation analysis.
✨ Features
- Effective Training: Trained on a large corpus of Portuguese financial news texts, ensuring a deep understanding of financial language.
- Low - Data Requirement for Sentiment Classification: Capable of building a sentiment classifier with a relatively small number of labeled texts.
- Superior Performance: Outperforms current state - of - the - art models in sentiment analysis of Brazilian Portuguese financial texts.
- Diverse Applications: Can be applied in multiple financial analysis scenarios, including sentiment index building, investment strategy formulation, and macroeconomic data analysis.
📦 Installation
The README does not provide specific installation steps, so this section is skipped.
💻 Usage Examples
Basic Usage
BertForSequenceClassification
from transformers import AutoTokenizer, BertForSequenceClassification
import numpy as np
pred_mapper = {
0: "POSITIVE",
1: "NEGATIVE",
2: "NEUTRAL"
}
tokenizer = AutoTokenizer.from_pretrained("lucas-leme/FinBERT-PT-BR")
finbertptbr = BertForSequenceClassification.from_pretrained("lucas-leme/FinBERT-PT-BR")
tokens = tokenizer(["Hoje a bolsa caiu", "Hoje a bolsa subiu"], return_tensors="pt",
padding=True, truncation=True, max_length=512)
finbertptbr_outputs = finbertptbr(**tokens)
preds = [pred_mapper[np.argmax(pred)] for pred in finbertptbr_outputs.logits.cpu().detach().numpy()]
Pipeline
from transformers import (
AutoTokenizer,
BertForSequenceClassification,
pipeline,
)
finbert_pt_br_tokenizer = AutoTokenizer.from_pretrained("lucas-leme/FinBERT-PT-BR")
finbert_pt_br_model = BertForSequenceClassification.from_pretrained("lucas-leme/FinBERT-PT-BR")
finbert_pt_br_pipeline = pipeline(task='text-classification', model=finbert_pt_br_model, tokenizer=finbert_pt_br_tokenizer)
finbert_pt_br_pipeline(['Hoje a bolsa caiu', 'Hoje a bolsa subiu'])
📚 Documentation
Applications
Sentiment Index

🔧 Technical Details
The model was trained in two main stages: language modeling and sentiment modeling. In the language modeling stage, it was trained on more than 1.4 million Portuguese financial news texts. This large - scale training enabled the model to capture the nuances of financial language in Brazilian Portuguese.
Subsequently, a sentiment classifier was built using 500 labeled texts. Despite the relatively small number of labeled texts, the classifier achieved satisfactory convergence, indicating the effectiveness of the pre - training.
A comparative analysis with other models was also carried out, showing that the developed model outperformed current state - of - the - art models. This superiority can be attributed to the model's targeted training on Portuguese financial texts.
📄 License
This project is licensed under the Apache - 2.0 license.
Author
Citation
@inproceedings{santos2023finbert,
title={FinBERT-PT-BR: An{\'a}lise de Sentimentos de Textos em Portugu{\^e}s do Mercado Financeiro},
author={Santos, Lucas L and Bianchi, Reinaldo AC and Costa, Anna HR},
booktitle={Anais do II Brazilian Workshop on Artificial Intelligence in Finance},
pages={144--155},
year={2023},
organization={SBC}
}
Paper