XLM-RoBERTa-German-sentiment Open Source Model - Accurately Conduct German Sentiment Analysis with an F1 Score of 87%

XLM RoBERTa German Sentiment

Developed by ssary

A multilingual sentiment analysis model based on the XLM-RoBERTa architecture, specifically optimized for German, achieving an 87% F1 score in German sentiment analysis tasks.

Text Classification

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #German Sentiment Analysis #Multilingual Support #High F1 Score

Downloads 1,946

Release Time : 1/22/2024

Model Overview

This model is designed for sentiment analysis tasks in 8 languages (especially German), capable of identifying negative, neutral, and positive emotions in text.

Model Features

Multilingual Support

Supports sentiment analysis in 8 languages, with special optimization for German.

High Performance

Achieves a weighted F1 score of 87% in German sentiment analysis tasks.

Large-Scale Training Data

Fine-tuned on over 200,000 German sentiment analysis samples, with strong capability to recognize subtle emotional differences.

Model Capabilities

German text sentiment analysis

Multilingual text sentiment analysis

Sentiment polarity classification (Negative/Neutral/Positive)

Use Cases

Social Media Analysis

User Comment Sentiment Analysis

Analyze the sentiment tendencies of user comments about products or services on social media.

Accurately identifies 87% of German comment sentiments

Customer Service

Customer Feedback Classification

Automatically classify the sentiment tendencies of customer feedback to help prioritize negative feedback.

🚀 XLM-RoBERTa-German-Sentiment

The XLM-RoBERTa-German-Sentiment model is designed for sentiment analysis in 8 languages, with a focus on German. It leverages the XLM-RoBERTa architecture and has been fine - tuned on a large German - language dataset.

🚀 Quick Start

To use this model, you need to install the Hugging Face Transformers library and PyTorch. You can do this using pip:

pip install torch transformers

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
text = "Erneuter Streik in der S-Bahn"
model = AutoModelForSequenceClassification.from_pretrained('ssary/XLM-RoBERTa-German-sentiment')
tokenizer = AutoTokenizer.from_pretrained('ssary/XLM-RoBERTa-German-sentiment')
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
sentiment_classes = ['negative', 'neutral', 'positive']
print(sentiment_classes[predictions.argmax()]) # for the class with highest probability
print(predictions) # for each class probability

✨ Features

Multilingual Support: The model can perform sentiment analysis for 8 languages, including German, Arabic, French, Hindi, Italian, Portuguese, Spanish, and English.
High Performance: Achieves an 87% Weighted F1 score.
Tailored for German: Fine - tuned on over 200,000 German - language sentiment analysis samples.

📦 Installation

To use this model, you need to install the Hugging Face Transformers library and PyTorch. You can do this using pip:

pip install torch transformers

💻 Usage Examples

Basic Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
text = "Erneuter Streik in der S-Bahn"
model = AutoModelForSequenceClassification.from_pretrained('ssary/XLM-RoBERTa-German-sentiment')
tokenizer = AutoTokenizer.from_pretrained('ssary/XLM-RoBERTa-German-sentiment')
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
sentiment_classes = ['negative', 'neutral', 'positive']
print(sentiment_classes[predictions.argmax()]) # for the class with highest probability
print(predictions) # for each class probability

📚 Documentation

Overview

The XLM - RoBERTa - German - Sentiment model is designed to perform Sentiment Analysis for 8 Languages and more specifically German language. This model leverages the XLM - RoBERTa architecture, a choice inspired by the superior performance of Facebook's RoBERTa over Google's BERT across numerous benchmarks. The decision to use XLM - RoBERTa stems from its multilingual capabilities. Specifically tailored for the German language, this model has been fine - tuned on over 200,000 German - language sentiment analysis samples. More on the training of the model can be found in the paper.

The dataset utilized for training, available at [this GitHub repository](https://github.com/oliverguhr/german - sentiment - lib), was developed by Oliver Guhr. We extend our gratitude to him for making the dataset open source. The dataset was influential in refining the model's accuracy and responsiveness to the nuances of German sentiment.

Our model and finetuning is based on sentiment analysis model called xlm - t [https://arxiv.org/abs/2104.12250].

Model Details

Property	Details
Model Type	XLM - RoBERTa
Performance	87% Weighted F1 score
Limitations	The model is only train and tested on the German language, but can handle the other 8 languages with lower accuracy

🔧 Technical Details

The model is based on the XLM - RoBERTa architecture. It was fine - tuned on over 200,000 German - language sentiment analysis samples. The choice of XLM - RoBERTa was due to its multilingual capabilities and the superior performance of RoBERTa over BERT in many benchmarks. The model's finetuning is based on the sentiment analysis model xlm - t [https://arxiv.org/abs/2104.12250].

📄 License

This model is released under the Apache - 2.0 license.

Acknowledgments

This model was developed by Sary Nasser at HTW - Berlin under supervision of Martin Steinicke.

References

Model's GitHub repository: [https://github.com/ssary/German - Sentiment - Analysis](https://github.com/ssary/German - Sentiment - Analysis)
Oliver Guhr Dataset paper: [Training a Broad - Coverage German Sentiment Classification Model for Dialog Systems](http://www.lrec - conf.org/proceedings/lrec2020/pdf/2020.lrec - 1.202.pdf)
Model architecture: XLM - T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご