๐ Indonesian RoBERTa Base Sentiment Classifier
The Indonesian RoBERTa Base Sentiment Classifier is a sentiment-text-classification model. It solves the problem of accurately classifying the sentiment of Indonesian texts. Its value lies in providing an effective tool for analyzing the sentiment of Indonesian comments and reviews.
๐ Quick Start
The Indonesian RoBERTa Base Sentiment Classifier is a sentiment - text - classification model based on the RoBERTa model. The model was initially the pre - trained Indonesian RoBERTa Base model, and then it was fine - tuned on the SmSA
dataset of indonlu
, which consists of Indonesian comments and reviews.
After training, the model achieved an evaluation accuracy of 94.36% and F1 - macro of 92.42%. On the benchmark test set, it achieved an accuracy of 93.2% and F1 - macro of 91.02%.
The Trainer
class from the Transformers library of Hugging Face was used to train the model. PyTorch was used as the backend framework during training, but the model remains compatible with other frameworks.
โจ Features
- Based on the RoBERTa model, which has strong text - understanding capabilities.
- Fine - tuned on the Indonesian
SmSA
dataset, suitable for classifying the sentiment of Indonesian texts.
- Achieved high accuracy and F1 - macro scores in evaluations.
- Compatible with multiple frameworks despite being trained with PyTorch as the backend.
๐ฆ Model
Property |
Details |
Model Type |
indonesian-roberta-base-sentiment-classifier |
#params |
124M |
Architecture |
RoBERTa Base |
Training/Validation data (text) |
SmSA |
๐ Evaluation Results
The model was trained for 5 epochs and the best model was loaded at the end.
Epoch |
Training Loss |
Validation Loss |
Accuracy |
F1 |
Precision |
Recall |
1 |
0.342600 |
0.213551 |
0.928571 |
0.898539 |
0.909803 |
0.890694 |
2 |
0.190700 |
0.213466 |
0.934127 |
0.901135 |
0.925297 |
0.882757 |
3 |
0.125500 |
0.219539 |
0.942857 |
0.920901 |
0.927511 |
0.915193 |
4 |
0.083600 |
0.235232 |
0.943651 |
0.924227 |
0.926494 |
0.922048 |
5 |
0.059200 |
0.262473 |
0.942063 |
0.920583 |
0.924084 |
0.917351 |
๐ป Usage Examples
Basic Usage
from transformers import pipeline
pretrained_name = "w11wo/indonesian-roberta-base-sentiment-classifier"
nlp = pipeline(
"sentiment-analysis",
model=pretrained_name,
tokenizer=pretrained_name
)
nlp("Jangan sampai saya telpon bos saya ya!")
๐ License
This project is licensed under the MIT license.
โ ๏ธ Important Note
Do consider the biases which come from both the pre - trained RoBERTa model and the SmSA
dataset that may be carried over into the results of this model.
๐จโ๐ป Author
The Indonesian RoBERTa Base Sentiment Classifier was trained and evaluated by Wilson Wongso. All computation and development are done on Google Colaboratory using their free GPU access.
๐ Citation
If used, please cite the following:
@misc {wilson_wongso_2023,
author = { {Wilson Wongso} },
title = { indonesian-roberta-base-sentiment-classifier (Revision e402e46) },
year = 2023,
url = { https://huggingface.co/w11wo/indonesian-roberta-base-sentiment-classifier },
doi = { 10.57967/hf/0644 },
publisher = { Hugging Face }
}