🚀 German Sentiment Classification with Bert
This model is designed for sentiment classification of German language texts, leveraging Google's Bert architecture. It simplifies the process with a Python package for preprocessing and inferencing.
🚀 Quick Start
This model was trained for sentiment classification of German language texts. To achieve the best results, all model inputs need to be preprocessed using the same procedure applied during training. To simplify the model's usage, we offer a Python package that bundles the code required for preprocessing and inferencing.
✨ Features
- Bert Architecture: The model uses Google's Bert architecture.
- Diverse Training Data: It was trained on 1.834 million German - language samples from various domains such as Twitter, Facebook, and movie, app, and hotel reviews.
- Python Package: A Python package is provided to simplify preprocessing and inferencing.
📦 Installation
To get started, install the package from pypi:
pip install germansentiment
💻 Usage Examples
Basic Usage
from germansentiment import SentimentModel
model = SentimentModel()
texts = [
"Mit keinem guten Ergebniss","Das ist gar nicht mal so gut",
"Total awesome!","nicht so schlecht wie erwartet",
"Der Test verlief positiv.","Sie fährt ein grünes Auto."]
result = model.predict_sentiment(texts)
print(result)
The code above will output the following list:
["negative","negative","positive","positive","neutral", "neutral"]
Advanced Usage
from germansentiment import SentimentModel
model = SentimentModel()
classes, probabilities = model.predict_sentiment(["das ist super"], output_probabilities = True)
print(classes, probabilities)
['positive'] [[['positive', 0.9761366844177246], ['negative', 0.023540444672107697], ['neutral', 0.00032294404809363186]]]
📚 Documentation
If you are interested in the code and data used to train this model, please refer to [this repository](https://github.com/oliverguhr/german - sentiment) and our [paper](http://www.lrec - conf.org/proceedings/lrec2020/pdf/2020.lrec - 1.202.pdf). Here is a table of the F1 scores that this model achieves on different datasets. Since we trained this model with a newer version of the transformer library, the results are slightly better than reported in the paper.
Property |
Details |
Model Type |
Google's Bert architecture |
Training Data |
1.834 million German - language samples from Twitter, Facebook, movie, app, and hotel reviews |
📄 License
This project is licensed under the MIT license.
Cite
For feedback and questions, contact me via mail or Twitter @oliverguhr. Please cite us if you found this useful:
@InProceedings{guhr-EtAl:2020:LREC,
author = {Guhr, Oliver and Schumann, Anne-Kathrin and Bahrmann, Frank and Böhme, Hans Joachim},
title = {Training a Broad-Coverage German Sentiment Classification Model for Dialog Systems},
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference},
month = {May},
year = {2020},
address = {Marseille, France},
publisher = {European Language Resources Association},
pages = {1620--1625},
url = {https://www.aclweb.org/anthology/2020.lrec-1.202}
}