๐ ๐ค + neuraly - Italian BERT Sentiment model
This model is designed to conduct sentiment analysis on Italian sentences, leveraging advanced techniques to offer accurate results.
๐ Quick Start
The model is built upon a pre - trained Italian BERT and fine - tuned for sentiment analysis. You can easily integrate it into your projects for Italian text sentiment assessment.
โจ Features
- Performs sentiment analysis on Italian sentences.
- Trained from [bert - base - italian - cased](https://huggingface.co/dbmdz/bert - base - italian - cased) and fine - tuned on an Italian tweet dataset.
- Achieves 82% accuracy on the test set.
๐ป Usage Examples
Basic Usage
import torch
from torch import nn
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("neuraly/bert-base-italian-cased-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("neuraly/bert-base-italian-cased-sentiment")
sentence = 'Huggingface รจ un team fantastico!'
input_ids = tokenizer.encode(sentence, add_special_tokens=True)
tensor = torch.tensor(input_ids).long()
tensor = tensor.unsqueeze(0)
logits, = model(tensor)
logits = logits.squeeze(0)
proba = nn.functional.softmax(logits, dim=0)
negative, neutral, positive = proba
๐ Documentation
Intended uses & limitations
How to use
The provided Python code demonstrates how to load the model, tokenize an Italian sentence, and extract sentiment probabilities.
Limitations and bias
A possible drawback (or bias) of this model is related to the fact that it was trained on a tweet dataset, with all the limitations that come with it. The domain is strongly related to football players and teams, but it works surprisingly well even on other topics.
Training data
We trained the model by combining the two tweet datasets taken from [Sentipolc EVALITA 2016](http://www.di.unito.it/~tutreeb/sentipolc - evalita16/data.html). Overall the dataset consists of 45K pre - processed tweets. The model weights come from a pre - trained instance of [bert - base - italian - cased](https://huggingface.co/dbmdz/bert - base - italian - cased).
Training procedure
Preprocessing
We tried to save as much information as possible, since BERT captures extremely well the semantic of complex text sequences. Overall we removed only @mentions, urls and emails from every tweet and kept pretty much everything else.
Hardware
- GPU: Nvidia GTX1080ti
- CPU: AMD Ryzen7 3700x 8c/16t
- RAM: 64GB DDR4
Hyperparameters
- Optimizer: AdamW with learning rate of 2e - 5, epsilon of 1e - 8
- Max epochs: 5
- Batch size: 32
- Early Stopping: enabled with patience = 1
Early stopping was triggered after 3 epochs.
Eval results
The model achieves an overall accuracy on the test set equal to 82%. The test set is a 20% split of the whole dataset.
About us
Neuraly is a young and dynamic startup committed to designing AI - driven solutions and services through the most advanced Machine Learning and Data Science technologies. You can find out more about who we are and what we do on our website.
Acknowledgments
Thanks to the generous support from the Hugging Face team, it is possible to download the model from their S3 storage and live test it from their inference API ๐ค.
๐ License
This project is licensed under the MIT license.