๐ German-Emotions
This model is a German translation of arpanghoshal/EmoRoBERTa. It enables the classification of 28 emotions in German transcripts, offering valuable insights for emotion analysis in German text.
๐ Quick Start
This model is a German adaptation of arpanghoshal/EmoRoBERTa. We utilized the go_emotions dataset, translated it into German, and fine - tuned the FacebookAI/xlm - roberta - base model. It can classify 28 emotions in German transcripts, including 'admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral'. For detailed information, refer to our published paper at the bottom.
โจ Features
- Capable of classifying 28 different emotions in German transcripts.
- Fine - tuned on a translated German version of the go_emotions dataset.
๐ฆ Installation
To use this model, you need to install the necessary libraries. You can use the following commands:
pip install transformers[torch]
pip install pandas, transformers, numpy, tqdm, openpyxl
๐ป Usage Examples
Basic Usage
import pandas as pd
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer
import numpy as np
from tqdm import tqdm
import time
import os
from transformers import DataCollatorWithPadding
import json
base_path = "/share/users/staff/c/clalk/Emotionen"
model_path = os.path.join(base_path, 'Modell')
file_path = os.path.join(base_path, 'Datensatz')
MODEL = "ChrisLalk/German-Emotions"
tokenizer = AutoTokenizer.from_pretrained(MODEL, do_lower_case=False)
model = AutoModelForSequenceClassification.from_pretrained(
model_path,
from_tf=False,
from_flax=False,
trust_remote_code=False,
num_labels=28,
ignore_mismatched_sizes=True
)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
os.chdir(file_path)
df_full = pd.read_excel("speech_turns_pat.xlsx", index_col=None)
if 'Unnamed: 0' in df_full.columns:
df_full = df_full.drop(columns=['Unnamed: 0'])
df_full.reset_index(drop=True, inplace=True)
def infer_texts(texts):
tokenized_texts = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
class SimpleDataset:
def __init__(self, tokenized_texts):
self.tokenized_texts = tokenized_texts
def __len__(self):
return len(self.tokenized_texts["input_ids"])
def __getitem__(self, idx):
return {k: v[idx] for k, v in self.tokenized_texts.items()}
test_dataset = SimpleDataset(tokenized_texts)
trainer = Trainer(model=model, data_collator=data_collator)
predictions = trainer.predict(test_dataset)
sigmoid = torch.nn.Sigmoid()
probs = sigmoid(torch.Tensor(predictions.predictions))
return np.round(np.array(probs), 3).tolist()
start_time = time.time()
df = df_full
results = []
for index, row in tqdm(df.iterrows(), total=df.shape[0]):
patient_texts = row['Patient']
prob_list = infer_texts(patient_texts)
results.append({
"File": row['Class']+"_"+row['session'],
"Class": row['Class'],
"session": row['session'],
"short_id": row["short_id"],
"long_id": row["long_id"],
"Sentence": patient_texts,
"Prediction": prob_list[0],
"hscl-11": row["Gesamtscore_hscl"],
"srs": row["srs_ges"],
})
df_results = pd.DataFrame(results)
df_results.to_json("emo_speech_turn_inference.json")
end_time = time.time()
elapsed_time = end_time - start_time
print(f"Elapsed time: {elapsed_time:.2f} seconds")
print(df_results)
emo_df = pd.DataFrame(df_results['Prediction'].tolist(), index=df_results["Class"].index)
col_names = ['admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral']
emo_df.columns = col_names
print(emo_df)
๐ Documentation
Model Details
Property |
Details |
Model Type |
text - classification |
Language(s) (NLP) |
German |
License |
apache - 2.0 |
Finetuned from model |
FacebookAI/xlm - roberta - base |
Hyperparameters |
Epochs: 10, learning_rate: 3e - 5, weight_decay: 0.01 |
Metrics |
f1_macro: 0.45, accuracy: 0.41, kappa: 0.42 |
Classification Metrics
Emotion |
Sentiment |
F1 |
Cohenโs Kappa |
admiration |
positive |
0.64 |
0.601 |
amusement |
positive |
0.78 |
0.767 |
anger |
negative |
0.38 |
0.358 |
annoyance |
negative |
0.27 |
0.229 |
approval |
positive |
0.34 |
0.293 |
caring |
positive |
0.38 |
0.365 |
confusion |
negative |
0.40 |
0.378 |
curiosity |
positive |
0.51 |
0.486 |
desire |
positive |
0.39 |
0.387 |
disappointment |
negative |
0.19 |
0.170 |
disapproval |
negative |
0.32 |
0.286 |
disgust |
negative |
0.41 |
0.395 |
embarrassment |
negative |
0.37 |
0.367 |
excitement |
positive |
0.35 |
0.339 |
fear |
negative |
0.59 |
0.584 |
gratitude |
positive |
0.89 |
0.882 |
grief |
negative |
0.31 |
0.307 |
joy |
positive |
0.51 |
0.499 |
love |
positive |
0.73 |
0.721 |
nervousness |
negative |
0.28 |
0.276 |
optimism |
positive |
0.53 |
0.512 |
pride |
positive |
0.30 |
0.299 |
realization |
positive |
0.17 |
0.150 |
relief |
positive |
0.27 |
0.266 |
remorse |
negative |
0.55 |
0.545 |
sadness |
negative |
0.50 |
0.488 |
surprise |
neutral |
0.53 |
0.514 |
neutral |
neutral |
0.60 |
0.410 |
๐ License
This model is licensed under the apache - 2.0 license.
Citation
When using our model, please cite the associated peer - reviewed paper:
@article{Lalk2025EmotionDetection,
author = {Christopher Lalk and Kim Targan and Tobias Steinbrenner and Jana Schaffrath and Steffen Eberhardt and Brian Schwartz and Antonia Vehlen and Wolfgang Lutz and Julian Rubel},
title = {Employing large language models for emotion detection in psychotherapy transcripts},
journal = {Frontiers in Psychiatry},
volume = {16},
year = {2025},
doi = {10.3389/fpsyt.2025.1504306}
}