modernBERT-base-multilingual-sentiment オープンソースモデル - 16種以上の言語をサポートする感情分析機能

ホーム

Modernbert Base Multilingual Sentiment

clapAIによって開発

ModernBERT-baseをファインチューニングした多言語感情分類モデルで、16以上の言語の感情分析をサポート

テキスト分類

Transformers

複数言語対応オープンソースライセンス:Apache-2.0 #多言語感情分析 #製品レビュー分類 #ModernBERTアーキテクチャ

ダウンロード数 2,220

リリース時間 : 12/31/2024

モデル概要

このモデルは多言語感情分類モデルで、さまざまな言語のテキストの感情傾向を分析でき、特に製品レビューや場所のレビューなどのシナリオに適しています。

モデル特徴

多言語サポート

英語、中国語、アラビア語などの主要言語を含む16以上の言語の感情分析をサポート

高性能

ModernBERTアーキテクチャに基づいて最適化され、高い精度を維持しながら優れた推論効率を実現

幅広い適用性

製品レビューや場所のレビューなどのビジネスシナリオの感情分析に特に適しています

モデル能力

テキスト感情分類

多言語テキスト分析

レビュー感情傾向判断

使用事例

電子商取引

製品レビュー分析

異なる言語のユーザーによる製品評価の感情傾向を分析

メーカーが製品の異なる市場での受け入れ度を理解するのに役立ちます

ソーシャルメディア監視

多言語世論分析

ソーシャルメディア上の異なる言語のユーザー感情傾向を監視

ネガティブな世論を早期に発見し、対応策を講じることができます

🚀 clapAI/modernBERT-base-multilingual-sentiment

modernBERT-base-multilingual-sentiment は、Multilingual-Sentiment コレクションに含まれる多言語感情分類モデルです。このモデルは、多言語感情分析において、16 以上の言語に対応し、英語、ベトナム語、中国語、フランス語、スペイン語、ポルトガル語、ドイツ語、イタリア語、ロシア語、日本語、韓国語、アラビア語などの言語での感情分類をサポートします。

🚀 クイックスタート

必要条件

transformers はバージョン 4.48.0.dev0 から ModernBERT アーキテクチャをサポートしています。以下のコマンドを使用して必要なバージョンをインストールしてください。

pip install "git+https://github.com/huggingface/transformers.git@6e0515e99c39444caae39472ee1b2fd76ece32f1" --upgrade

推論性能を向上させるために FlashAttention をインストールします。

pip install flash-attn==2.7.2.post1

コード例

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_id = "clapAI/modernBERT-base-multilingual-sentiment"
# トークナイザーとモデルをロード
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id, torch_dtype=torch.float16)

model.to(device)
model.eval()


# モデルの設定からラベルを取得
id2label = model.config.id2label

texts = [
    # 英語
    {
        "text": "I absolutely love the new design of this app!",
        "label": "positive"
    },
    {
        "text": "The customer service was disappointing.",
        "label": "negative"
    },
    # アラビア語
    {
        "text": "هذا المنتج رائع للغاية!",
        "label": "positive"
    },
    {
        "text": "الخدمة كانت سيئة للغاية.",
        "label": "negative"
    },
    # ドイツ語
    {
        "text": "Ich bin sehr zufrieden mit dem Kauf.",
        "label": "positive"
    },
    {
        "text": "Die Lieferung war eine Katastrophe.",
        "label": "negative"
    },
    # スペイン語
    {
        "text": "Este es el mejor libro que he leído.",
        "label": "positive"
    },
    {
        "text": "El producto llegó roto y no funciona.",
        "label": "negative"
    },
    # フランス語
    {
        "text": "J'adore ce restaurant, la nourriture est délicieuse!",
        "label": "positive"
    },
    {
        "text": "Le service était très lent et désagréable.",
        "label": "negative"
    },
    # インドネシア語
    {
        "text": "Saya sangat senang dengan pelayanan ini.",
        "label": "positive"
    },
    {
        "text": "Makanannya benar-benar tidak enak.",
        "label": "negative"
    },
    # 日本語
    {
        "text": "この製品は本当に素晴らしいです！",
        "label": "positive"
    },
    {
        "text": "サービスがひどかったです。",
        "label": "negative"
    },
    # 韓国語
    {
        "text": "이 제품을 정말 좋아해요!",
        "label": "positive"
    },
    {
        "text": "고객 서비스가 정말 실망스러웠어요.",
        "label": "negative"
    },
    # ロシア語
    {
        "text": "Этот фильм просто потрясающий!",
        "label": "positive"
    },
    {
        "text": "Качество было ужасным.",
        "label": "negative"
    },
    # ベトナム語
    {
        "text": "Tôi thực sự yêu thích sản phẩm này!",
        "label": "positive"
    },
    {
        "text": "Dịch vụ khách hàng thật tệ.",
        "label": "negative"
    },
    # 中国語
    {
        "text": "我非常喜欢这款产品！",
        "label": "positive"
    },
    {
        "text": "质量真的很差。",
        "label": "negative"
    }
]

for item in texts:
    text = item["text"]
    label = item["label"]

    inputs = tokenizer(text, return_tensors="pt").to(device)

    # 推論モードで推論を実行
    with torch.inference_mode():
        outputs = model(**inputs)
        predictions = outputs.logits.argmax(dim=-1)
    print(f"Text: {text} | Label: {label} | Prediction: {id2label[predictions.item()]}")

✨ 主な機能

多言語対応: 16 以上の言語に対応した感情分類が可能です。
高精度: 微調整により、高い F1 スコアを達成しています。

📦 インストール

インストールに必要なコマンドは以下の通りです。

pip install "git+https://github.com/huggingface/transformers.git@6e0515e99c39444caae39472ee1b2fd76ece32f1" --upgrade
pip install flash-attn==2.7.2.post1

📚 ドキュメント

モデルの概要

このモデルは answerdotai/ModernBERT-base をベースに、多言語感情データセット clapAI/MultiLingualSentiment を使用して微調整されています。

評価と性能

微調整後、最適なモデルがロードされ、clapAI/MultiLingualSentiment の test データセットで評価されました。

モデル	事前学習モデル	パラメータ数	F1 スコア
modernBERT-base-multilingual-sentiment	ModernBERT-base	150M	80.16
modernBERT-large-multilingual-sentiment	ModernBERT-large	396M	81.4
roberta-base-multilingual-sentiment	XLM-roberta-base	278M	81.8
roberta-large-multilingual-sentiment	XLM-roberta-large	560M	82.6

トレーニング手順

トレーニングハイパーパラメータ

トレーニング時に使用されたハイパーパラメータは以下の通りです。

learning_rate: 5e-05
train_batch_size: 512
eval_batch_size: 512
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 2
total_train_batch_size: 2048
total_eval_batch_size: 1024
optimizer:
  type: adamw_torch_fused
  betas: [ 0.9, 0.999 ]
  epsilon: 1e-08
  optimizer_args: "No additional optimizer arguments"
lr_scheduler:
  type: cosine
  warmup_ratio: 0.01
num_epochs: 5.0
mixed_precision_training: Native AMP

フレームワークバージョン

transformers==4.48.0.dev0
torch==2.4.0+cu121
datasets==3.2.0
tokenizers==0.21.0
flash-attn==2.7.2.post1

🔧 技術詳細

このモデルは、事前学習済みの ModernBERT-base モデルを、多言語感情データセットで微調整することで構築されています。トレーニングには、多 GPU 環境を利用し、混合精度トレーニングを行っています。

📄 ライセンス

このプロジェクトは Apache-2.0 ライセンスの下で公開されています。

引用

もしこのプロジェクトが役に立った場合は、リポジトリをスターし、以下のように引用してください。

@misc{modernBERT-base-multilingual-sentiment,
      title={modernBERT-base-multilingual-sentiment: A Multilingual Sentiment Classification Model},
      author={clapAI},
      howpublished={\url{https://huggingface.co/clapAI/modernBERT-base-multilingual-sentiment}},
      year={2025},
}