🚀 Spanish Sentiment Analysis Classifier
This BERT-based text classifier is designed to detect sentiments in Spanish, developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA).
🚀 Quick Start
This BERT-based text classifier was developed as a thesis project for the Computer Engineering degree at Universidad de Buenos Aires (UBA).
The model is designed to detect sentiments in Spanish and was fine-tuned on the dccuchile/bert-base-spanish-wwm-uncased model using a specific set of hyperparameters.
It was trained on a dataset containing 11,500 Spanish tweets collected from various regions, both positive and negative. These tweets were sourced from a well-curated combination of TASS datasets.
✨ Features
- Sentiment Detection: Specifically designed to detect sentiments in Spanish text.
- Fine - Tuned Model: Fine - tuned on the dccuchile/bert-base-spanish-wwm-uncased model with specific hyperparameters.
- Trained on Diverse Data: Trained on 11,500 Spanish tweets from various regions, sourced from TASS datasets.
📦 Installation
You can install the required dependencies using pip:
pip install transformers torch
💻 Usage Examples
Basic Usage
from transformers import BertForSequenceClassification, BertTokenizer
model = BertForSequenceClassification.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis")
tokenizer = BertTokenizer.from_pretrained("VerificadoProfesional/SaBERT-Spanish-Sentiment-Analysis")
Advanced Usage
def predict(model,tokenizer,text,threshold = 0.5):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = torch.softmax(logits, dim=1).squeeze().tolist()
predicted_class = torch.argmax(logits, dim=1).item()
if probabilities[predicted_class] <= threshold and predicted_class == 1:
predicted_class = 0
return bool(predicted_class), probabilities
text = "Your Spanish news text here"
predicted_label,probabilities = predict(model,tokenizer,text)
print(f"Text: {text}")
print(f"Predicted Class: {predicted_label}")
print(f"Probabilities: {probabilities}")
📚 Documentation
Team Members
Model Details
Property |
Details |
Model Type |
dccuchile/bert-base-spanish-wwm-uncased |
Hyperparameters |
dropout_rate = 0.1, num_classes = 2, max_length = 128, batch_size = 16, num_epochs = 5, learning_rate = 3e - 5 |
Training Data |
11,500 Spanish tweets (Positive and Negative) |
Metrics
The model's performance was evaluated using the following metrics:
- Accuracy = 86.47%
- F1 - Score = 86.47%
- Precision = 86.46%
- Recall = 86.51%
📄 License
🔗 Acknowledgments
Special thanks to DCC UChile for the base Spanish BERT model and to all contributors to the dataset used for training.