đ HebEMO - Emotion Recognition Model for Modern Hebrew
HebEMO is a powerful tool designed to detect polarity and extract emotions from modern Hebrew User-Generated Content (UGC). It was trained on a unique Covid-19 related dataset that we collected and annotated. The model has achieved remarkable results, with a high performance of weighted average F1-score = 0.96 for polarity classification. In emotion detection, it reached an F1-score ranging from 0.78 to 0.97, except for surprise, where the model had a lower performance (F1 = 0.41). These results are even better than the best-reported performance, even when compared to English language models.
đ Quick Start
Emotion Recognition Model
You can access the online model at huggingface spaces or as colab notebook.
!git clone https://github.com/avichaychriqui/HeBERT.git
from HeBERT.src.HebEMO import *
HebEMO_model = HebEMO()
HebEMO_model.hebemo(input_path = 'data/text_example.txt')
hebEMO_df = HebEMO_model.hebemo(text='××××× ×פ×× ××××׊ר××', plot=True)

For sentiment classification model (polarity ONLY):
from transformers import AutoTokenizer, AutoModel, pipeline
tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis")
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")
sentiment_analysis = pipeline(
"sentiment-analysis",
model="avichr/heBERT_sentiment_analysis",
tokenizer="avichr/heBERT_sentiment_analysis",
return_all_scores = True
)
sentiment_analysis('×× × ××Ē××× ×× ××××× ××ר×××Ē ×Ļ×ר×××')
>>> [[{'label': 'neutral', 'score': 0.9978172183036804},
>>> {'label': 'positive', 'score': 0.0014792329166084528},
>>> {'label': 'negative', 'score': 0.0007035882445052266}]]
sentiment_analysis('×§×¤× ×× ××ĸ××')
>>> [[{'label': 'neutral', 'score': 0.00047328314394690096},
>>> {'label': 'possitive', 'score': 0.9994067549705505},
>>> {'label': 'negetive', 'score': 0.00011996887042187154}]]
sentiment_analysis('×× × ×× ×××× ××Ē ××ĸ×××')
>>> [[{'label': 'neutral', 'score': 9.214012970915064e-05},
>>> {'label': 'possitive', 'score': 8.876807987689972e-05},
>>> {'label': 'negetive', 'score': 0.9998190999031067}]]
⨠Features
- Emotion Detection: HebEMO can accurately detect eight basic emotions (anger, disgust, anticipation, fear, joy, sadness, surprise, and trust) from modern Hebrew UGC.
- Polarity Classification: It can classify the sentiment (positive, negative, or neutral) of the input text.
- High Performance: Achieved high F1-scores in both emotion recognition and polarity classification.
đĻ Installation
!git clone https://github.com/avichaychriqui/HeBERT.git
đģ Usage Examples
Basic Usage
from HeBERT.src.HebEMO import *
HebEMO_model = HebEMO()
HebEMO_model.hebemo(input_path = 'data/text_example.txt')
hebEMO_df = HebEMO_model.hebemo(text='××××× ×פ×× ××××׊ר××', plot=True)
Advanced Usage
from transformers import AutoTokenizer, AutoModel, pipeline
tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis")
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")
sentiment_analysis = pipeline(
"sentiment-analysis",
model="avichr/heBERT_sentiment_analysis",
tokenizer="avichr/heBERT_sentiment_analysis",
return_all_scores = True
)
sentiment_analysis('×× × ××Ē××× ×× ××××× ××ר×××Ē ×Ļ×ר×××')
đ Documentation
Emotion UGC Data Description
Our UGC data consists of comments posted on news articles collected from 3 major Israeli news sites between January 2020 and August 2020. The total size of the data is approximately 150 MB, containing over 7 million words and 350K sentences.
Approximately 2000 sentences were annotated by crowd members (3 - 10 annotators per sentence) for overall sentiment (polarity) and eight emotions based on Robert Plutchik's wheel of emotions: anger, disgust, anticipation, fear, joy, sadness, surprise, and trust.
The table below shows the percentage of sentences in which each emotion appeared.
|
anger |
disgust |
expectation |
fear |
happy |
sadness |
surprise |
trust |
sentiment |
ratio |
0.78 |
0.83 |
0.58 |
0.45 |
0.12 |
0.59 |
0.17 |
0.11 |
0.25 |
Performance
Emotion Recognition
emotion |
f1-score |
precision |
recall |
anger |
0.96 |
0.99 |
0.93 |
disgust |
0.97 |
0.98 |
0.96 |
anticipation |
0.82 |
0.80 |
0.87 |
fear |
0.79 |
0.88 |
0.72 |
joy |
0.90 |
0.97 |
0.84 |
sadness |
0.90 |
0.86 |
0.94 |
surprise |
0.40 |
0.44 |
0.37 |
trust |
0.83 |
0.86 |
0.80 |
The above metrics are for the positive class (meaning, the emotion is reflected in the text).
Sentiment (Polarity) Analysis
|
precision |
recall |
f1-score |
neutral |
0.83 |
0.56 |
0.67 |
positive |
0.96 |
0.92 |
0.94 |
negative |
0.97 |
0.99 |
0.98 |
accuracy |
|
|
0.97 |
macro avg |
0.92 |
0.82 |
0.86 |
weighted avg |
0.96 |
0.97 |
0.96 |
The sentiment (polarity) analysis model is also available on AWS! For more information, visit AWS' git.
đ License
No license information provided in the original README.
đ Citation
If you used this model, please cite us as follows:
Chriqui, A., & Yahav, I. (2021). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. arXiv preprint arXiv:2102.01909.
@article{chriqui2021hebert,
title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
author={Chriqui, Avihay and Yahav, Inbal},
journal={arXiv preprint arXiv:2102.01909},
year={2021}
}
đ Contact us