German-Emotions开源德语情绪分类模型 - 精准识别28种情绪，免费使用！

首页

German Emotions

由 ChrisLalk 开发

基于XLM-RoBERTa的德语情绪分类模型，可识别28种情绪

文本分类

Transformers

德语开源协议:Apache-2.0 #德语情绪分析 #多情绪分类 #心理治疗辅助

下载量 299

发布时间 : 7/15/2024

模型简介

本模型是arpanghoshal/EmoRoBERTa的德语翻译版本，使用翻译后的go_emotions数据集对XLM-RoBERTa-base进行微调，专门用于德语文本的情绪分类任务。

模型特点

多情绪分类

能够识别德语文本中的28种不同情绪状态

跨语言迁移

基于多语言XLM-RoBERTa模型，通过翻译数据实现英语到德语的情绪分类能力迁移

医疗应用优化

特别关注医疗领域的情绪识别需求，模型在相关场景下表现良好

模型能力

德语文本情绪分类

多标签情绪识别

心理治疗文本分析

使用案例

心理健康

心理治疗会话分析

分析心理治疗过程中的患者情绪变化

可用于治疗进展监测和效果评估

情绪状态监测

长期跟踪患者的情绪波动模式

辅助诊断和个性化治疗计划制定

客户服务

客户反馈情绪分析

分析德语客户的反馈情绪

识别不满客户并优先处理

🚀 德语情感分类模型

这是一个用于德语情感分类的模型，它基于FacebookAI/xlm - roberta - base模型，使用go_emotions数据集的德语翻译版本进行微调。该模型能够对德语转录文本中的28种情感进行分类，为德语情感分析提供了有力支持。

🚀 快速开始

本模型可对德语转录文本中的28种情感进行分类。我们使用了go_emotions数据集，将其翻译成德语，并对FacebookAI/xlm - roberta - base模型进行了微调。该模型能够识别的28种情感包括：“钦佩”、“愉悦”、“愤怒”、“恼怒”、“认可”、“关心”、“困惑”、“好奇”、“渴望”、“失望”、“不认可”、“厌恶”、“尴尬”、“兴奋”、“恐惧”、“感激”、“悲痛”、“喜悦”、“爱”、“紧张”、“乐观”、“自豪”、“领悟”、“宽慰”、“懊悔”、“悲伤”、“惊讶”、“中立”。更多详细信息请参考文末发表的论文。

✨ 主要特性

语言支持：专注于德语情感分类，适用于德语转录文本。
情感种类丰富：能够识别28种不同的情感，满足多样化的情感分析需求。
模型基础：基于FacebookAI/xlm - roberta - base模型微调，具有较好的泛化能力。

📦 安装指南

在使用该模型前，你需要安装相关依赖库：

pip install transformers[torch]
pip install pandas, transformers, numpy, tqdm, openpyxl

💻 使用示例

基础用法

# pip install transformers[torch]
# pip install pandas, transformers, numpy, tqdm, openpyxl
import pandas as pd
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer
import numpy as np
from tqdm import tqdm
import time
import os
from transformers import DataCollatorWithPadding
import json

# create base path and input and output path for the model folder and the file folder
base_path = "/share/users/staff/c/clalk/Emotionen"
model_path = os.path.join(base_path, 'Modell')
file_path = os.path.join(base_path, 'Datensatz')

MODEL = "ChrisLalk/German-Emotions"
tokenizer = AutoTokenizer.from_pretrained(MODEL, do_lower_case=False)
model = AutoModelForSequenceClassification.from_pretrained(
    model_path,
    from_tf=False,
    from_flax=False,
    trust_remote_code=False,
    num_labels=28,
    ignore_mismatched_sizes=True
)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Path to the file
os.chdir(file_path)
df_full = pd.read_excel("speech_turns_pat.xlsx", index_col=None)

if 'Unnamed: 0' in df_full.columns:
    df_full = df_full.drop(columns=['Unnamed: 0'])

df_full.reset_index(drop=True, inplace=True)

# Tokenization and inference function
def infer_texts(texts):
    tokenized_texts = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
    class SimpleDataset:
        def __init__(self, tokenized_texts):
            self.tokenized_texts = tokenized_texts
        def __len__(self):
            return len(self.tokenized_texts["input_ids"])
        def __getitem__(self, idx):
            return {k: v[idx] for k, v in self.tokenized_texts.items()}
    test_dataset = SimpleDataset(tokenized_texts)
    trainer = Trainer(model=model, data_collator=data_collator)
    predictions = trainer.predict(test_dataset)
    sigmoid = torch.nn.Sigmoid()
    probs = sigmoid(torch.Tensor(predictions.predictions))
    return np.round(np.array(probs), 3).tolist()

start_time = time.time()
df = df_full

# Save results in a dict, here the df contains the additional variables File, Class, session, short_id, long_id, Prediction, hscl-11, and srs.
# However, only the "Sentence" column with the text is relevant for the pipeline. 
results = []
for index, row in tqdm(df.iterrows(), total=df.shape[0]):
    patient_texts = row['Patient']
    prob_list = infer_texts(patient_texts)
    results.append({
        "File": row['Class']+"_"+row['session'],
        "Class": row['Class'],
        "session": row['session'],
        "short_id": row["short_id"],
        "long_id": row["long_id"],
        "Sentence": patient_texts,
        "Prediction": prob_list[0],
        "hscl-11": row["Gesamtscore_hscl"],
        "srs": row["srs_ges"],
    })

# Convert results to df
df_results = pd.DataFrame(results)
df_results.to_json("emo_speech_turn_inference.json")

end_time = time.time()
elapsed_time = end_time - start_time
print(f"Elapsed time: {elapsed_time:.2f} seconds")
print(df_results)

emo_df = pd.DataFrame(df_results['Prediction'].tolist(), index=df_results["Class"].index)
col_names = ['admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral']
emo_df.columns = col_names
print(emo_df)

📚 详细文档

模型详情

属性	详情
模型类型	文本分类
语言 (NLP)	德语
许可证	apache - 2.0
微调基础模型	FacebookAI/xlm - roberta - base
超参数	轮数：10；学习率：3e - 5；权重衰减：0.01
指标	F1宏观：0.45；准确率：0.41；kappa值：0.42

分类指标

情感	情感倾向	F1值	Cohen’s Kappa值
钦佩	积极	0.64	0.601
愉悦	积极	0.78	0.767
愤怒	消极	0.38	0.358
恼怒	消极	0.27	0.229
认可	积极	0.34	0.293
关心	积极	0.38	0.365
困惑	消极	0.40	0.378
好奇	积极	0.51	0.486
渴望	积极	0.39	0.387
失望	消极	0.19	0.170
不认可	消极	0.32	0.286
厌恶	消极	0.41	0.395
尴尬	消极	0.37	0.367
兴奋	积极	0.35	0.339
恐惧	消极	0.59	0.584
感激	积极	0.89	0.882
悲痛	消极	0.31	0.307
喜悦	积极	0.51	0.499
爱	积极	0.73	0.721
紧张	消极	0.28	0.276
乐观	积极	0.53	0.512
自豪	积极	0.30	0.299
领悟	积极	0.17	0.150
宽慰	积极	0.27	0.266
懊悔	消极	0.55	0.545
悲伤	消极	0.50	0.488
惊讶	中立	0.53	0.514
中立	中立	0.60	0.410

引用

使用本模型时，请引用相关的同行评审论文：

@article{Lalk2025EmotionDetection, 
  author = {Christopher Lalk and Kim Targan and Tobias Steinbrenner and Jana Schaffrath and Steffen Eberhardt and Brian Schwartz and Antonia Vehlen and Wolfgang Lutz and Julian Rubel}, 
  title = {Employing large language models for emotion detection in psychotherapy transcripts}, 
  journal = {Frontiers in Psychiatry}, 
  volume = {16}, 
  year = {2025}, 
  doi = {10.3389/fpsyt.2025.1504306}
}