đ SiEBERT - English-Language Sentiment Classification
SiEBERT enables reliable binary sentiment analysis for various types of English-language text.
đ Quick Start
This model ("SiEBERT", prefix for "Sentiment in English") is a fine - tuned checkpoint of RoBERTa - large (Liu et al. 2019). It can perform reliable binary sentiment analysis on various English - language texts. For each text instance, it predicts either positive (1) or negative (0) sentiment. The model was fine - tuned and evaluated on 15 data sets from diverse text sources to enhance generalization across different text types (reviews, tweets, etc.). As a result, it outperforms models trained on only one type of text (e.g., movie reviews from the popular SST - 2 benchmark) when used on new data.
⨠Features
- Generalization: Fine - tuned on 15 diverse data sets, enhancing its performance across different text types.
- High Performance: Outperforms a DistilBERT - based model by more than 15 percentage points on average.
- Easy to Use: Can be used in Hugging Face pipelines with just a few lines of code.
- Fine - Tunable: Can serve as a starting point for further fine - tuning on specific data.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
The easiest way to use the model for single predictions is Hugging Face's sentiment analysis pipeline.
from transformers import pipeline
sentiment_analysis = pipeline("sentiment-analysis",model="siebert/sentiment-roberta-large-english")
print(sentiment_analysis("I love this!"))

Advanced Usage
If you want to predict sentiment for your own data, we provide an example script via Google Colab. You can load your data to a Google Drive and run the script for free on a Colab GPU. It's recommended to manually label a subset of your data to evaluate performance for your use case.

The model can also be used as a starting point for further fine - tuning of RoBERTa on your specific data. Please refer to Hugging Face's documentation for further details and example code.
đ Documentation
For performance benchmark values across various sentiment analysis contexts, please refer to our paper (Hartmann et al. 2023).
đ§ Technical Details
To evaluate the performance of our general - purpose sentiment analysis model, we set aside an evaluation set from each data set, which was not used for training. On average, our model outperforms a DistilBERT - based model (which is solely fine - tuned on the popular SST - 2 data set) by more than 15 percentage points (78.1 vs. 93.2 percent). As a robustness check, we evaluate the model in a leave - one - out manner (training on 14 data sets, evaluating on the one left out), which decreases model performance by only about 3 percentage points on average and underscores its generalizability. Model performance is given as evaluation set accuracy in percent.
Dataset |
DistilBERT SST - 2 |
This model |
McAuley and Leskovec (2013) (Reviews) |
84.7 |
98.0 |
McAuley and Leskovec (2013) (Review Titles) |
65.5 |
87.0 |
Yelp Academic Dataset |
84.8 |
96.5 |
Maas et al. (2011) |
80.6 |
96.0 |
Kaggle |
87.2 |
96.0 |
Pang and Lee (2005) |
89.7 |
91.0 |
Nakov et al. (2013) |
70.1 |
88.5 |
Shamma (2009) |
76.0 |
87.0 |
Blitzer et al. (2007) (Books) |
83.0 |
92.5 |
Blitzer et al. (2007) (DVDs) |
84.5 |
92.5 |
Blitzer et al. (2007) (Electronics) |
74.5 |
95.0 |
Blitzer et al. (2007) (Kitchen devices) |
80.0 |
98.5 |
Pang et al. (2002) |
73.5 |
95.5 |
Speriosu et al. (2011) |
71.5 |
85.5 |
Hartmann et al. (2019) |
65.5 |
98.0 |
Average |
78.1 |
93.2 |
Fine - tuning hyperparameters
- learning_rate = 2e - 5
- num_train_epochs = 3.0
- warmump_steps = 500
- weight_decay = 0.01
Other values were left at their defaults as listed here.
đ License
No license information is provided in the original document.
đ Citation
Please cite this paper (Published in the IJRM) when you use our model.
@article{hartmann2023,
title = {More than a Feeling: Accuracy and Application of Sentiment Analysis},
journal = {International Journal of Research in Marketing},
volume = {40},
number = {1},
pages = {75-87},
year = {2023},
doi = {https://doi.org/10.1016/j.ijresmar.2022.05.005},
url = {https://www.sciencedirect.com/science/article/pii/S0167811622000477},
author = {Jochen Hartmann and Mark Heitmann and Christian Siebert and Christina Schamp},
}
If you have any questions or feedback, feel free to reach out to [christian.siebert@uni - hamburg.de](mailto:christian.siebert@uni - hamburg.de).