🚀 Model Card: CLF-SENTIMENTOS-CMTS finetuned XLM-RoBERTa
This model, clf-sentimentos-cmts, applies machine learning techniques to specific natural language processing (NLP) tasks. Specifically, it classifies sentiments in Brazilian Portuguese social media texts, including emoji processing. It is an adaptation of XLM-RoBERTa, a highly effective and robust Transformer architecture pre - trained on a vast multilingual dataset.
✨ Features
- Fine - tuning for Specific Tasks: Unlike standard language model training, the fine - tuning of tbluhm/clf-sentimentos-cmts adjusts the XLM - RoBERTa parameters on a specific dataset. This optimizes it for sentiment text classification in Brazilian Portuguese, including emoji interpretation. The diverse dataset includes comments from politicians, artists, and automotive companies' profiles, reflecting a wide range of social media contexts and linguistic expressions in Brazil.
- Deep Contextual Analysis: When fed a social media comment, the model deeply analyzes each word and emoji, considering the global context of the text. Using attention mechanisms, it weighs the importance of each element in relation to the overall sentiment expressed in the comment. This approach enables accurate classification, assigning a sentiment label based on the text's contextual and semantic understanding, including emoji interpretation.
- Emoji - Aware Sentiment Classification: The model can recognize positive sentiments when users express satisfaction with smiley emojis and negative sentiments when users express dissatisfaction or criticism with sad emojis. For comments that do not clearly express an emotion or are purely informative, it labels them as neutral.
- Wide Range of Applications: Besides classifying sentiments in Brazilian Portuguese social media comments, the model has various potential applications. Companies can use it to monitor public perception of their products and services on social media platforms, identify emerging trends, and areas for improvement. It can also be used for automated content moderation, filtering out negative or inappropriate comments.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
Here is an example of how to use the model in Python with the Transformers library:
from transformers import pipeline
analise_sentimento = pipeline("text-classification", model="tbluhm/clf-sentimentos-cmts")
texto = "Excelente notícia para todos os brasileiros!"
resultado = analise_sentimento(texto)
print(resultado)
📚 Documentation
Model Origin
This model is a fine - tuned version of xlm - roberta - base - tweet - sentiment - pt.
Performance Metrics
The model achieves the following results on the evaluation set:
- Loss: 0.7189
- Accuracy: 0.6467
- F1: 0.5588
Model Objective
The objective of this model is to classify the sentiment of short texts into categories such as positive, negative, or neutral. It can be used in various applications, including social media sentiment analysis, product reviews, and customer feedback.
Intended Use
To use the model, simply provide a short text as input to the Sentiment Analysis Pipeline. The model will classify the text's sentiment as Positive, Negative, or Neutral.
Training Data
The model was fine - tuned on a dataset composed of product reviews, tweets, and other short - text sources in various languages. The training dataset includes over 1 million labeled examples.
Limitations and Ethical Considerations
It's important to note that the model may not capture all aspects of human sentiment and may not be perfect in all situations. Additionally, the model may reflect biases present in the training data. Therefore, it is recommended to use the model with caution and consider its limitations.
🔧 Technical Details
Training Procedure
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e - 05
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- num_epochs: 2
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
Accuracy |
F1 |
0.7039 |
1.0 |
9 |
0.7650 |
0.6413 |
0.5526 |
0.6487 |
2.0 |
18 |
0.7189 |
0.6467 |
0.5588 |
Framework Versions
- Transformers 4.38.2
- Pytorch 2.2.1+cpu
- Datasets 2.18.0
- Tokenizers 0.15.2
📄 License
The model is released under the MIT license.
📖 Citation
Author: Thiago D. Faria Bluhm. (2024).
Adapted from: [XLM - ROBERTA](https://huggingface.co/FacebookAI/xlm - roberta - base).
Acknowledgments
Contributors: Wesley Dos Anjos, Pedro Lustosa, Amanda Rangel, Audrey Marx, Gabriel Leal, and Tiago Vettorazi.
Widget Examples
Example Title |
Text |
Positive |
Eu gostei muito daquele ator no filme. |
Negative |
Esse político é uma pessoa sem escrúpulos. |