twitter-xlm-roberta-emotion-es Open-source Model - Free Deployment for Seven Emotion Classification of Spanish Tweets

Twitter Xlm Roberta Emotion Es

Developed by daveni

A model based on the XLM-roBERTa-base architecture, fine-tuned for sentiment analysis on Spanish tweets, capable of classifying text into seven emotions

Text Classification

Transformers

Spanish#Spanish Tweet Analysis #Seven Emotion Recognition #XLM-RoBERTa Fine-tuning

Downloads 5,638

Release Time : 3/2/2022

Model Overview

This model is designed for sentiment analysis of Spanish tweets, identifying seven emotion categories including anger, disgust, fear, happiness, sadness, surprise, and others.

Model Features

Multi-emotion Classification

Capable of identifying seven different emotion categories, including basic emotions and an 'other' category

Spanish Language Optimization

Specifically fine-tuned for Spanish tweets, ideal for analyzing Spanish social media content

Competition-validated Performance

Achieved first place in the EmoEvalEs competition at IberLEF 2021, with a macro-average F1 score of 71.70%

Model Capabilities

Spanish Text Classification

Social Media Sentiment Analysis

Multi-category Emotion Recognition

Use Cases

Social Media Analysis

Tweet Sentiment Monitoring

Analyze emotional tendencies in Spanish user tweets

Can identify seven different emotional states

Brand Sentiment Analysis

Assess public sentiment towards brands or products in Spanish-speaking markets

Academic Research

Affective Computing Research

Used for research related to Spanish-language affective computing

Performance validated in the EmoEvalEs competition

🚀 twitter-XLM-roBERTa-base for Emotion Analysis

This is a XLM - roBERTa - base model trained on ~198M tweets and finetuned for emotion analysis in Spanish. It achieved top - position in a competition for classifying Spanish tweets into seven emotion classes.

🚀 Quick Start

Note: This model & model card are based on the finetuned XLM - T for Sentiment Analysis

This is a XLM - roBERTa - base model trained on ~198M tweets and finetuned for emotion analysis on the Spanish language. This model was presented to the EmoEvalEs competition, part of the IberLEF 2021 Conference, where the proposed task was the classification of Spanish tweets between seven different classes: anger, disgust, fear, joy, sadness, surprise, and other. We achieved the first position in the competition with a macro - averaged F1 score of 71.70%.

✨ Features

Trained on a large dataset of ~198M tweets.
Finetuned specifically for emotion analysis in Spanish.
Achieved the first position in the EmoEvalEs competition with a high macro - averaged F1 score.

💻 Usage Examples

Basic Usage

from transformers import pipeline
model_path = "daveni/twitter-xlm-roberta-emotion-es"
emotion_analysis = pipeline("text-classification", framework="pt", model=model_path, tokenizer=model_path)
emotion_analysis("Einstein dijo: Solo hay dos cosas infinitas, el universo y los pinches anuncios de bitcoin en Twitter. Paren ya carajo aaaaaaghhgggghhh me quiero murir")

[{'label': 'anger', 'score': 0.48307016491889954}]

Advanced Usage

from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer, AutoConfig
import numpy as np
from scipy.special import softmax
# Preprocess text (username and link placeholders)
def preprocess(text):
    new_text = []
    for t in text.split(" "):
        t = '@user' if t.startswith('@') and len(t) > 1 else t
        t = 'http' if t.startswith('http') else t
        new_text.append(t)
    return " ".join(new_text)
model_path = "daveni/twitter-xlm-roberta-emotion-es"
tokenizer = AutoTokenizer.from_pretrained(model_path )
config = AutoConfig.from_pretrained(model_path )
# PT
model = AutoModelForSequenceClassification.from_pretrained(model_path )
text = "Se ha quedao bonito día para publicar vídeo, ¿no? Hoy del tema más diferente que hemos tocado en el canal."
text = preprocess(text)
print(text)
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
# Print labels and scores
ranking = np.argsort(scores)
ranking = ranking[::-1]
for i in range(scores.shape[0]):
    l = config.id2label[ranking[i]]
    s = scores[ranking[i]]
    print(f"{i+1}) {l} {np.round(float(s), 4)}")

Output:

Se ha quedao bonito día para publicar vídeo, ¿no? Hoy del tema más diferente que hemos tocado en el canal.
1) joy 0.7887
2) others 0.1679
3) surprise 0.0152
4) sadness 0.0145
5) anger 0.0077
6) disgust 0.0033
7) fear 0.0027

🔧 Technical Details

Limitations and bias

The dataset we used for finetuning was unbalanced, where almost half of the records belonged to the other class so there might be bias towards this class.

Training data

Pretrained weights were left identical to the original model released by cardiffnlp. We used the EmoEvalEs Dataset for finetuning.

📚 Documentation

BibTeX entry and citation info

@inproceedings{vera2021gsi,
  title={GSI-UPM at IberLEF2021: Emotion Analysis of Spanish Tweets by Fine-tuning the XLM-RoBERTa Language Model},
  author={Vera, D and Araque, O and Iglesias, CA},
  booktitle={Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021). CEUR Workshop Proceedings, CEUR-WS, M{\'a}laga, Spain},
  year={2021}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご