modernBERT-base-multilingual-sentiment Open-source Model - Support sentiment analysis for over 16 languages

Modernbert Base Multilingual Sentiment

Developed by clapAI

A multilingual sentiment classification model fine-tuned based on ModernBERT-base, supporting sentiment analysis for 16+ languages

Text Classification

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multilingual Sentiment Analysis #Product Review Classification #ModernBERT Architecture

Downloads 2,220

Release Time : 12/31/2024

Model Overview

This model is a multilingual sentiment classification model capable of analyzing sentiment tendencies in texts across multiple languages, particularly suitable for scenarios such as product reviews and location reviews.

Model Features

Multilingual Support

Supports sentiment analysis for over 16 languages, including major languages such as English, Chinese, and Arabic

Efficient Performance

Optimized based on the ModernBERT architecture, maintaining high accuracy while ensuring good inference efficiency

Broad Applicability

Especially suitable for sentiment analysis in commercial scenarios like product reviews and location reviews

Model Capabilities

Text Sentiment Classification

Multilingual Text Analysis

Review Sentiment Tendency Judgment

Use Cases

E-commerce

Product Review Analysis

Analyze sentiment tendencies of user reviews in different languages for products

Helps businesses understand product acceptance in different markets

Social Media Monitoring

Multilingual Public Sentiment Analysis

Monitor user sentiment tendencies across different languages on social media

Identify negative public sentiment promptly and take responsive measures

🚀 clapAI/modernBERT-base-multilingual-sentiment

modernBERT-base-multilingual-sentiment is a multilingual sentiment classification model. It belongs to the Multilingual-Sentiment collection. This model is fine - tuned from answerdotai/ModernBERT-base using the multilingual sentiment dataset clapAI/MultiLingualSentiment. It supports multilingual sentiment classification across 16+ languages, such as English, Vietnamese, Chinese, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Arabic, etc.

✨ Features

Multilingual support: Covers 16+ languages including English, Chinese, Vietnamese, etc.
Sentiment classification: Capable of classifying text sentiment accurately.

📦 Installation

Requirements

Since transformers only supports the ModernBERT architecture from version 4.48.0.dev0, use the following command to get the required version:

pip install "git+https://github.com/huggingface/transformers.git@6e0515e99c39444caae39472ee1b2fd76ece32f1" --upgrade

Install FlashAttention to accelerate inference performance:

pip install flash-attn==2.7.2.post1

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_id = "clapAI/modernBERT-base-multilingual-sentiment"
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id, torch_dtype=torch.float16)

model.to(device)
model.eval()


# Retrieve labels from the model's configuration
id2label = model.config.id2label

texts = [
    # English
    {
        "text": "I absolutely love the new design of this app!",
        "label": "positive"
    },
    {
        "text": "The customer service was disappointing.",
        "label": "negative"
    },
    # Arabic
    {
        "text": "هذا المنتج رائع للغاية!",
        "label": "positive"
    },
    {
        "text": "الخدمة كانت سيئة للغاية.",
        "label": "negative"
    },
    # German
    {
        "text": "Ich bin sehr zufrieden mit dem Kauf.",
        "label": "positive"
    },
    {
        "text": "Die Lieferung war eine Katastrophe.",
        "label": "negative"
    },
    # Spanish
    {
        "text": "Este es el mejor libro que he leído.",
        "label": "positive"
    },
    {
        "text": "El producto llegó roto y no funciona.",
        "label": "negative"
    },
    # French
    {
        "text": "J'adore ce restaurant, la nourriture est délicieuse!",
        "label": "positive"
    },
    {
        "text": "Le service était très lent et désagréable.",
        "label": "negative"
    },
    # Indonesian
    {
        "text": "Saya sangat senang dengan pelayanan ini.",
        "label": "positive"
    },
    {
        "text": "Makanannya benar-benar tidak enak.",
        "label": "negative"
    },
    # Japanese
    {
        "text": "この製品は本当に素晴らしいです！",
        "label": "positive"
    },
    {
        "text": "サービスがひどかったです。",
        "label": "negative"
    },
    # Korean
    {
        "text": "이 제품을 정말 좋아해요!",
        "label": "positive"
    },
    {
        "text": "고객 서비스가 정말 실망스러웠어요.",
        "label": "negative"
    },
    # Russian
    {
        "text": "Этот фильм просто потрясающий!",
        "label": "positive"
    },
    {
        "text": "Качество было ужасным.",
        "label": "negative"
    },
    # Vietnamese
    {
        "text": "Tôi thực sự yêu thích sản phẩm này!",
        "label": "positive"
    },
    {
        "text": "Dịch vụ khách hàng thật tệ.",
        "label": "negative"
    },
    # Chinese
    {
        "text": "我非常喜欢这款产品！",
        "label": "positive"
    },
    {
        "text": "质量真的很差。",
        "label": "negative"
    }
]

for item in texts:
    text = item["text"]
    label = item["label"]

    inputs = tokenizer(text, return_tensors="pt").to(device)

    # Perform inference in inference mode
    with torch.inference_mode():
        outputs = model(**inputs)
        predictions = outputs.logits.argmax(dim=-1)
    print(f"Text: {text} | Label: {label} | Prediction: {id2label[predictions.item()]}")

📚 Documentation

Evaluation & Performance

After fine - tuning, the best model is loaded and evaluated on the test dataset from clapAI/MultiLingualSentiment.

Model	Pretrained Model	Parameters	F1 - score
[modernBERT - base - multilingual - sentiment](https://huggingface.co/clapAI/modernBERT - base - multilingual - sentiment)	ModernBERT - base	150M	80.16
[modernBERT - large - multilingual - sentiment](https://huggingface.co/clapAI/modernBERT - large - multilingual - sentiment)	ModernBERT - large	396M	81.4
[roberta - base - multilingual - sentiment](https://huggingface.co/clapAI/roberta - base - multilingual - sentiment)	XLM - roberta - base	278M	81.8
[roberta - large - multilingual - sentiment](https://huggingface.co/clapAI/roberta - large - multilingual - sentiment)	XLM - roberta - large	560M	82.6

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 512
eval_batch_size: 512
seed: 42
distributed_type: multi - GPU
num_devices: 2
gradient_accumulation_steps: 2
total_train_batch_size: 2048
total_eval_batch_size: 1024
optimizer:
  type: adamw_torch_fused
  betas: [ 0.9, 0.999 ]
  epsilon: 1e-08
  optimizer_args: "No additional optimizer arguments"
lr_scheduler:
  type: cosine
  warmup_ratio: 0.01
num_epochs: 5.0
mixed_precision_training: Native AMP

Framework versions

transformers==4.48.0.dev0
torch==2.4.0+cu121
datasets==3.2.0
tokenizers==0.21.0
flash - attn==2.7.2.post1

📄 License

This project is licensed under the apache - 2.0 license.

📖 Citation

If you find our project helpful, please star our repo and cite our work. Thanks!

@misc{modernBERT-base-multilingual-sentiment,
      title={modernBERT-base-multilingual-sentiment: A Multilingual Sentiment Classification Model},
      author={clapAI},
      howpublished={\url{https://huggingface.co/clapAI/modernBERT-base-multilingual-sentiment}},
      year={2025},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご