đ Indonesian Sentiment Analysis Model
This model is a fine - tuned version of an Indonesian pre - trained BERT model, designed to perform sentiment analysis on Indonesian comments and reviews, offering valuable insights into public opinion.
đ Quick Start
You can load the model and perform inference as follows:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("taufiqdp/indonesian-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("taufiqdp/indonesian-sentiment")
class_names = ['negatif', 'netral', 'positif']
text = "Pelayanan lama dan tidak ramah"
tokenized_text = tokenizer(text, return_tensors='pt')
with torch.inference_mode():
logits = model(**tokenized_text)['logits']
result = class_names[logits.argmax(dim=1)]
print(result)
⨠Features
- Fine - tuned BERT: Based on IndoBERT Base Uncased, a BERT model pre - trained on Indonesian text data.
- Multi - class Classification: Classifies Indonesian review text into three sentiment categories: Negative, Neutral, and Positive.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("taufiqdp/indonesian-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("taufiqdp/indonesian-sentiment")
class_names = ['negatif', 'netral', 'positif']
text = "Pelayanan lama dan tidak ramah"
tokenized_text = tokenizer(text, return_tensors='pt')
with torch.inference_mode():
logits = model(**tokenized_text)['logits']
result = class_names[logits.argmax(dim=1)]
print(result)
đ Documentation
Model Details
This model is a fine - tuned version of IndoBERT Base Uncased, a BERT model pre - trained on Indonesian text data. It was fine - tuned to perform sentiment analysis on Indonesian comments and reviews.
The model was trained on indonlu (SmSA
) and indonesian_sentiment datasets.
The model classifies a given Indonesian review text into one of three categories:
- Negative
- Neutral
- Positive
Training hyperparameters
- train_batch_size: 32
- eval_batch_size: 32
- learning_rate: 1e - 4
- optimizer: AdamW with betas=(0.9, 0.999), eps = 1e - 8, and weight_decay = 0.01
- epochs: 3
- learning_rate_scheduler: StepLR with step_size = 592, gamma = 0.1
Training Results
The following table shows the training results for the model:
Epoch |
Loss |
Accuracy |
1 |
0.2936 |
0.9310 |
2 |
0.1212 |
0.9526 |
3 |
0.0795 |
0.9569 |
How to Use
You can load the model and perform inference as shown in the Usage Examples section.
đ§ Technical Details
The model is a fine - tuned BERT model, which is a powerful architecture for natural language processing tasks. It leverages the pre - trained knowledge on Indonesian text data from IndoBERT Base Uncased and further fine - tunes it on specific sentiment analysis datasets. The training hyperparameters are carefully selected to optimize the model's performance on sentiment classification.
đ License
This model is released under the MIT license.
Citation
@misc{koto2020indolem,
title={IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP},
author={Fajri Koto and Afshin Rahimi and Jey Han Lau and Timothy Baldwin},
year={2020},
eprint={2011.00677},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@inproceedings{purwarianti2019improving,
title={Improving Bi - LSTM Performance for Indonesian Sentiment Analysis Using Paragraph Vector},
author={Ayu Purwarianti and Ida Ayu Putu Ari Crisdayanti},
booktitle={Proceedings of the 2019 International Conference of Advanced Informatics: Concepts, Theory and Applications (ICAICTA)},
pages={1--5},
year={2019},
organization={IEEE}
}
Additional Information