German-Emotions開源德語情緒分類模型 - 精準識別28種情緒，免費使用！

首頁

German Emotions

由ChrisLalk開發

基於XLM-RoBERTa的德語情緒分類模型，可識別28種情緒

文本分類

Transformers

德語開源協議:Apache-2.0 #德語情緒分析 #多情緒分類 #心理治療輔助

下載量 299

發布時間 : 7/15/2024

模型概述

本模型是arpanghoshal/EmoRoBERTa的德語翻譯版本，使用翻譯後的go_emotions數據集對XLM-RoBERTa-base進行微調，專門用於德語文本的情緒分類任務。

模型特點

多情緒分類

能夠識別德語文本中的28種不同情緒狀態

跨語言遷移

基於多語言XLM-RoBERTa模型，通過翻譯數據實現英語到德語的情緒分類能力遷移

醫療應用優化

特別關注醫療領域的情緒識別需求，模型在相關場景下表現良好

模型能力

德語文本情緒分類

多標籤情緒識別

心理治療文本分析

使用案例

心理健康

心理治療會話分析

分析心理治療過程中的患者情緒變化

可用於治療進展監測和效果評估

情緒狀態監測

長期跟蹤患者的情緒波動模式

輔助診斷和個性化治療計劃制定

客戶服務

客戶反饋情緒分析

分析德語客戶的反饋情緒

識別不滿客戶並優先處理

🚀 德語情感分類模型

這是一個用於德語情感分類的模型，它基於FacebookAI/xlm - roberta - base模型，使用go_emotions數據集的德語翻譯版本進行微調。該模型能夠對德語轉錄文本中的28種情感進行分類，為德語情感分析提供了有力支持。

🚀 快速開始

本模型可對德語轉錄文本中的28種情感進行分類。我們使用了go_emotions數據集，將其翻譯成德語，並對FacebookAI/xlm - roberta - base模型進行了微調。該模型能夠識別的28種情感包括：“欽佩”、“愉悅”、“憤怒”、“惱怒”、“認可”、“關心”、“困惑”、“好奇”、“渴望”、“失望”、“不認可”、“厭惡”、“尷尬”、“興奮”、“恐懼”、“感激”、“悲痛”、“喜悅”、“愛”、“緊張”、“樂觀”、“自豪”、“領悟”、“寬慰”、“懊悔”、“悲傷”、“驚訝”、“中立”。更多詳細信息請參考文末發表的論文。

✨ 主要特性

語言支持：專注於德語情感分類，適用於德語轉錄文本。
情感種類豐富：能夠識別28種不同的情感，滿足多樣化的情感分析需求。
模型基礎：基於FacebookAI/xlm - roberta - base模型微調，具有較好的泛化能力。

📦 安裝指南

在使用該模型前，你需要安裝相關依賴庫：

pip install transformers[torch]
pip install pandas, transformers, numpy, tqdm, openpyxl

💻 使用示例

基礎用法

# pip install transformers[torch]
# pip install pandas, transformers, numpy, tqdm, openpyxl
import pandas as pd
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer
import numpy as np
from tqdm import tqdm
import time
import os
from transformers import DataCollatorWithPadding
import json

# create base path and input and output path for the model folder and the file folder
base_path = "/share/users/staff/c/clalk/Emotionen"
model_path = os.path.join(base_path, 'Modell')
file_path = os.path.join(base_path, 'Datensatz')

MODEL = "ChrisLalk/German-Emotions"
tokenizer = AutoTokenizer.from_pretrained(MODEL, do_lower_case=False)
model = AutoModelForSequenceClassification.from_pretrained(
    model_path,
    from_tf=False,
    from_flax=False,
    trust_remote_code=False,
    num_labels=28,
    ignore_mismatched_sizes=True
)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Path to the file
os.chdir(file_path)
df_full = pd.read_excel("speech_turns_pat.xlsx", index_col=None)

if 'Unnamed: 0' in df_full.columns:
    df_full = df_full.drop(columns=['Unnamed: 0'])

df_full.reset_index(drop=True, inplace=True)

# Tokenization and inference function
def infer_texts(texts):
    tokenized_texts = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
    class SimpleDataset:
        def __init__(self, tokenized_texts):
            self.tokenized_texts = tokenized_texts
        def __len__(self):
            return len(self.tokenized_texts["input_ids"])
        def __getitem__(self, idx):
            return {k: v[idx] for k, v in self.tokenized_texts.items()}
    test_dataset = SimpleDataset(tokenized_texts)
    trainer = Trainer(model=model, data_collator=data_collator)
    predictions = trainer.predict(test_dataset)
    sigmoid = torch.nn.Sigmoid()
    probs = sigmoid(torch.Tensor(predictions.predictions))
    return np.round(np.array(probs), 3).tolist()

start_time = time.time()
df = df_full

# Save results in a dict, here the df contains the additional variables File, Class, session, short_id, long_id, Prediction, hscl-11, and srs.
# However, only the "Sentence" column with the text is relevant for the pipeline. 
results = []
for index, row in tqdm(df.iterrows(), total=df.shape[0]):
    patient_texts = row['Patient']
    prob_list = infer_texts(patient_texts)
    results.append({
        "File": row['Class']+"_"+row['session'],
        "Class": row['Class'],
        "session": row['session'],
        "short_id": row["short_id"],
        "long_id": row["long_id"],
        "Sentence": patient_texts,
        "Prediction": prob_list[0],
        "hscl-11": row["Gesamtscore_hscl"],
        "srs": row["srs_ges"],
    })

# Convert results to df
df_results = pd.DataFrame(results)
df_results.to_json("emo_speech_turn_inference.json")

end_time = time.time()
elapsed_time = end_time - start_time
print(f"Elapsed time: {elapsed_time:.2f} seconds")
print(df_results)

emo_df = pd.DataFrame(df_results['Prediction'].tolist(), index=df_results["Class"].index)
col_names = ['admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral']
emo_df.columns = col_names
print(emo_df)

📚 詳細文檔

模型詳情

屬性	詳情
模型類型	文本分類
語言 (NLP)	德語
許可證	apache - 2.0
微調基礎模型	FacebookAI/xlm - roberta - base
超參數	輪數：10；學習率：3e - 5；權重衰減：0.01
指標	F1宏觀：0.45；準確率：0.41；kappa值：0.42

分類指標

情感	情感傾向	F1值	Cohen’s Kappa值
欽佩	積極	0.64	0.601
愉悅	積極	0.78	0.767
憤怒	消極	0.38	0.358
惱怒	消極	0.27	0.229
認可	積極	0.34	0.293
關心	積極	0.38	0.365
困惑	消極	0.40	0.378
好奇	積極	0.51	0.486
渴望	積極	0.39	0.387
失望	消極	0.19	0.170
不認可	消極	0.32	0.286
厭惡	消極	0.41	0.395
尷尬	消極	0.37	0.367
興奮	積極	0.35	0.339
恐懼	消極	0.59	0.584
感激	積極	0.89	0.882
悲痛	消極	0.31	0.307
喜悅	積極	0.51	0.499
愛	積極	0.73	0.721
緊張	消極	0.28	0.276
樂觀	積極	0.53	0.512
自豪	積極	0.30	0.299
領悟	積極	0.17	0.150
寬慰	積極	0.27	0.266
懊悔	消極	0.55	0.545
悲傷	消極	0.50	0.488
驚訝	中立	0.53	0.514
中立	中立	0.60	0.410

引用

使用本模型時，請引用相關的同行評審論文：

@article{Lalk2025EmotionDetection, 
  author = {Christopher Lalk and Kim Targan and Tobias Steinbrenner and Jana Schaffrath and Steffen Eberhardt and Brian Schwartz and Antonia Vehlen and Wolfgang Lutz and Julian Rubel}, 
  title = {Employing large language models for emotion detection in psychotherapy transcripts}, 
  journal = {Frontiers in Psychiatry}, 
  volume = {16}, 
  year = {2025}, 
  doi = {10.3389/fpsyt.2025.1504306}
}