distilbert-base-turkish-cased-emotion開源模型 - 精準實現土耳其語情感分析

首頁

Distilbert Base Turkish Cased Emotion

由zafercavdar開發

基於Distilbert的土耳其語情感分析模型，在翻譯為土耳其語的情感數據集上微調

文本分類

Transformers

其他#土耳其語情感分析 #輕量級BERT #多標籤分類

下載量 231

發布時間 : 4/19/2022

模型概述

該模型是基於Distilbert架構的土耳其語文本情感分類器，能夠識別文本中的六種基本情感（悲傷、喜悅、愛、憤怒、恐懼、驚訝）。

模型特點

高效輕量

基於DistilBERT架構，在保持較高準確率的同時減少計算資源需求

多情感分類

能夠識別六種不同的情感類別

土耳其語優化

專門針對土耳其語文本進行訓練和優化

模型能力

土耳其語文本分類

情感分析

多標籤情感識別

使用案例

社交媒體分析

土耳其語推文情感分析

分析土耳其語社交媒體內容中的用戶情感傾向

準確率達83.25%，F1分數83.17

客戶反饋分析

產品評論情感分類

自動分類土耳其語產品評論中的用戶情感

🚀 土耳其語情感分類預訓練模型 DistilBERT

DistilBERT 是一個輕量級的 BERT 模型，本項目基於土耳其語預訓練的 DistilBERT 模型，在情感分類數據集上進行微調，用於文本情感分類任務。

🚀 快速開始

安裝依賴

確保你已經安裝了 transformers 庫，可以使用以下命令進行安裝：

pip install transformers

模型推理

使用以下代碼進行文本情感分類推理：

from transformers import pipeline
classifier = pipeline("text-classification",
                       model='zafercavdar/distilbert-base-turkish-cased-emotion',
                       return_all_scores=True)
prediction = classifier("Bu kütüphaneyi seviyorum, en iyi yanı kolay kullanımı.", )
print(prediction)

"""
Output:
[
  [
    {'label': 'sadness', 'score': 0.0026786490343511105},
    {'label': 'joy', 'score': 0.6600754261016846},
    {'label': 'love', 'score': 0.3203163146972656},
    {'label': 'anger', 'score': 0.004358913749456406},
    {'label': 'fear', 'score': 0.002354539930820465},
    {'label': 'surprise', 'score': 0.010216088965535164}
  ]
]

"""

✨ 主要特性

微調模型：基於 Distilbert-base-turkish-cased 模型在情感數據集上進行微調。
多情感分類：可以識別文本中的多種情感，包括悲傷、喜悅、愛、憤怒、恐懼和驚訝。
高性能：在 Twitter 情感數據集上取得了較高的準確率和 F1 分數。

📦 安裝指南

使用 pip 安裝所需的庫：

pip install transformers

💻 使用示例

基礎用法

from transformers import pipeline
classifier = pipeline("text-classification",
                       model='zafercavdar/distilbert-base-turkish-cased-emotion',
                       return_all_scores=True)
prediction = classifier("Bu kütüphaneyi seviyorum, en iyi yanı kolay kullanımı.", )
print(prediction)

📚 詳細文檔

模型描述

Distilbert-base-turkish-cased 模型在情感數據集（通過 Google Translate API 翻譯成土耳其語）上進行了微調，使用了 HuggingFace Trainer 和以下超參數：

 learning rate 2e-5, 
 batch size 64,
 num_train_epochs=8,

模型性能比較

在 Twitter 情感數據集上的模型性能比較：

模型	準確率	F1 分數	每秒測試樣本數
Distilbert-base-turkish-cased-emotion	83.25	83.17	232.197

數據集

使用的數據集為 Twitter-Sentiment-Analysis。

評估結果

{
 'eval_accuracy': 0.8325,
 'eval_f1': 0.8317301441160213,
 'eval_loss': 0.5021793842315674,
 'eval_runtime': 8.6167,
 'eval_samples_per_second': 232.108,
 'eval_steps_per_second': 3.714
}

🔧 技術細節

本項目使用了 HuggingFace 的 transformers 庫進行模型的微調。具體步驟包括：

加載預訓練的 Distilbert-base-turkish-cased 模型。
使用 Google Translate API 將情感數據集翻譯成土耳其語。
使用 HuggingFace Trainer 進行模型微調，設置學習率為 2e-5，批量大小為 64，訓練輪數為 8。
在測試集上評估模型性能，記錄準確率、F1 分數等指標。

📄 許可證

請參考原項目的許可證信息。

屬性	詳情
模型類型	基於 DistilBERT 的文本分類模型
訓練數據	Twitter 情感數據集（翻譯成土耳其語）