🚀 beto_sentiment_analysis_es Model
A finetuned model for sentiment analysis in Spanish, leveraging the BETO base model and trained on Spanish movie reviews.
🚀 Quick Start
This model is designed for sentiment analysis of Spanish text, especially movie reviews. It's fine - tuned based on the BETO model and trained using Amazon SageMaker and Hugging Face's Deep Learning container.
✨ Features
- Based on BETO: Utilizes the BETO model, a BERT - base model pre - trained on a Spanish corpus, trained with the Whole Word Masking technique.
- Multilingual Dataset: Trained on a balanced dataset of about 50,000 Spanish movie reviews, with reviews and labels available in both English and Spanish.
- High Performance: Achieves high scores in accuracy, F1 score, precision, and recall on the test dataset.
📦 Installation
No specific installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("edumunozsala/beto_sentiment_analysis_es")
model = AutoModelForSequenceClassification.from_pretrained("edumunozsala/beto_sentiment_analysis_es")
text ="Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"
input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
outputs = model(input_ids)
output = outputs.logits.argmax(1)
📚 Documentation
Model Information
Property |
Details |
Model Type |
beto_sentiment_analysis_es |
Training Data |
IMDbreviews_es (about 50,000 Spanish movie reviews) |
Model Index
- Name: beto_sentiment_analysis_es
- Results:
- Task:
- Name: Sentiment Analysis
- Type: sentiment - analysis
- Dataset:
- Name: "IMDb Reviews in Spanish"
- Type: IMDbreviews_es
- Metrics:
- Name: Accuracy, Type: accuracy, Value: 0.9101333333333333
- Name: F1 Score, Type: f1, Value: 0.9088450094671354
- Name: Precision, Type: precision, Value: 0.9105691056910569
- Name: Recall, Type: recall, Value: 0.9071274298056156
BETO Citation
Spanish Pre - Trained BERT Model and Evaluation Data
@inproceedings{CaneteCFP2020,
title={Spanish Pre-Trained BERT Model and Evaluation Data},
author={Cañete, José and Chaperon, Gabriel and Fuentes, Rodrigo and Ho, Jou - Hui and Kang, Hojin and Pérez, Jorge},
booktitle={PML4DC at ICLR 2020},
year={2020}
}
Dataset
The dataset consists of about 50,000 Spanish movie reviews. It is balanced and includes reviews and labels in both English and Spanish.
- Train dataset: 42,500
- Validation dataset: 3,750
- Test dataset: 3,750
Intended Uses & Limitations
This model is intended for sentiment analysis of Spanish text, especially movie reviews, but can also be applied to other types of reviews.
Hyperparameters
{
"epochs": "4",
"train_batch_size": "32",
"eval_batch_size": "8",
"fp16": "true",
"learning_rate": "3e-05",
"model_name": "\"dccuchile/bert-base-spanish-wwm-uncased\"",
"sagemaker_container_log_level": "20",
"sagemaker_program": "\"train.py\""
}
Evaluation Results
- Accuracy: 0.9101333333333333
- F1 Score: 0.9088450094671354
- Precision: 0.9105691056910569
- Recall: 0.9071274298056156
📄 License
This model is released under the Apache 2.0 license.
Created by Eduardo Muñoz/@edumunozsala