đ HebEMO - Emotion Recognition Model for Modern Hebrew
HebEMO is a powerful tool designed to detect polarity and extract emotions from modern Hebrew User-Generated Content (UGC). It was trained on a unique Covid-19 related dataset that we collected and annotated, achieving remarkable performance.
đ Quick Start
HebEMO offers high - performance polarity classification and emotion detection. For polarity classification, it achieved a weighted average F1 - score of 0.96. In emotion detection, it reached an F1 - score of 0.78 - 0.97, except for surprise (F1 = 0.41). These results outperform the best - reported performance, even compared to English language models.
⨠Features
- Polarity and Emotion Detection: HebEMO can accurately detect the polarity and emotions in modern Hebrew UGC.
- Trained on Unique Dataset: It was trained on a special Covid - 19 related dataset, which enhances its performance.
đ Documentation
Emotion UGC Data Description
Our UGC data consists of comments on news articles from 3 major Israeli news sites, collected between January 2020 and August 2020. The data is about 150 MB, containing over 7 million words and 350K sentences.
Approximately 2000 sentences were annotated by crowd members (3 - 10 annotators per sentence) for overall sentiment (polarity) and eight emotions: anger, disgust, anticipation, fear, joy, sadness, surprise, and trust.
Property |
Details |
Model Type |
Emotion recognition and polarity analysis model for modern Hebrew |
Training Data |
Comments on news articles from 3 major Israeli news sites (January 2020 - August 2020), about 150 MB, over 7 million words, 350K sentences |
The percentage of sentences in which each emotion appeared is shown in the following table:
|
anger |
disgust |
expectation |
fear |
happy |
sadness |
surprise |
trust |
sentiment |
ratio |
0.78 |
0.83 |
0.58 |
0.45 |
0.12 |
0.59 |
0.17 |
0.11 |
0.25 |
Performance
Emotion Recognition
emotion |
f1 - score |
precision |
recall |
anger |
0.96 |
0.99 |
0.93 |
disgust |
0.97 |
0.98 |
0.96 |
anticipation |
0.82 |
0.80 |
0.87 |
fear |
0.79 |
0.88 |
0.72 |
joy |
0.90 |
0.97 |
0.84 |
sadness |
0.90 |
0.86 |
0.94 |
surprise |
0.40 |
0.44 |
0.37 |
trust |
0.83 |
0.86 |
0.80 |
The above metrics are for the positive class (meaning, the emotion is reflected in the text).
Sentiment (Polarity) Analysis
|
precision |
recall |
f1 - score |
neutral |
0.83 |
0.56 |
0.67 |
positive |
0.96 |
0.92 |
0.94 |
negative |
0.97 |
0.99 |
0.98 |
accuracy |
|
|
0.97 |
macro avg |
0.92 |
0.82 |
0.86 |
weighted avg |
0.96 |
0.97 |
0.96 |
The sentiment (polarity) analysis model is also available on AWS! For more information, visit AWS' git
đģ Usage Examples
Basic Usage
An online model can be found at huggingface spaces or as colab notebook
!git clone https://github.com/avichaychriqui/HeBERT.git
from HeBERT.src.HebEMO import *
HebEMO_model = HebEMO()
HebEMO_model.hebemo(input_path = 'data/text_example.txt')
hebEMO_df = HebEMO_model.hebemo(text='××××× ×פ×× ××××׊ר××', plot=True)
Advanced Usage
For sentiment classification model (polarity ONLY):
from transformers import AutoTokenizer, AutoModel, pipeline
tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis")
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")
sentiment_analysis = pipeline(
"sentiment-analysis",
model="avichr/heBERT_sentiment_analysis",
tokenizer="avichr/heBERT_sentiment_analysis",
return_all_scores = True
)
sentiment_analysis('×× × ××Ē××× ×× ××××× ××ר×××Ē ×Ļ×ר×××')
>>> [[{'label': 'neutral', 'score': 0.9978172183036804},
>>> {'label': 'positive', 'score': 0.0014792329166084528},
>>> {'label': 'negative', 'score': 0.0007035882445052266}]]
sentiment_analysis('×§×¤× ×× ××ĸ××')
>>> [[{'label': 'neutral', 'score': 0.00047328314394690096},
>>> {'label': 'possitive', 'score': 0.9994067549705505},
>>> {'label': 'negetive', 'score': 0.00011996887042187154}]]
sentiment_analysis('×× × ×× ×××× ××Ē ××ĸ×××')
>>> [[{'label': 'neutral', 'score': 9.214012970915064e-05},
>>> {'label': 'possitive', 'score': 8.876807987689972e-05},
>>> {'label': 'negetive', 'score': 0.9998190999031067}]]
đ License
No license information provided in the original document.
Contact us
Avichay Chriqui
Inbal yahav
The Coller Semitic Languages AI Lab
Thank you, ×Ē×××, Ø´ŲØąØ§
Citation
If you used this model, please cite us as:
Chriqui, A., & Yahav, I. (2022). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. INFORMS Journal on Data Science, forthcoming.
@article{chriqui2021hebert,
title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
author={Chriqui, Avihay and Yahav, Inbal},
journal={INFORMS Journal on Data Science},
year={2022}
}