beto_sentiment_analysis_es Open-source Model - Free Deployment for Accurate Sentiment Analysis of Spanish Movie Reviews

Beto Sentiment Analysis Es

Developed by edumunozsala

A fine-tuned model optimized for Spanish sentiment analysis, based on BETO (a Spanish pre-trained BERT model), suitable for text sentiment classification tasks such as movie reviews.

Text Classification

Transformers

SpanishOpen Source License:Apache-2.0 #Spanish Sentiment Analysis #Movie Review Classification #BERT Fine-tuning

Downloads 135

Release Time : 4/25/2025

Model Overview

This model was trained using Amazon SageMaker and the new Hugging Face deep learning container, with BETO as the base model—a BERT-base model pre-trained on Spanish corpus. Designed specifically for sentiment analysis of Spanish documents, it is optimized for movie reviews but can be extended to other types of reviews.

Model Features

Spanish Optimization

Based on the BETO model pre-trained on Spanish corpus, using whole-word masking for better understanding of Spanish text.

High-Performance Sentiment Analysis

Achieves 91.01% accuracy and 90.88% F1 score on the Spanish IMDb movie review dataset, demonstrating excellent performance.

Easy Integration

Provides a compatible interface with the Hugging Face transformers library, allowing seamless integration into existing NLP pipelines.

Model Capabilities

Spanish Text Classification

Sentiment Analysis

Movie Review Sentiment Polarity Judgment

Use Cases

Entertainment

Movie Review Sentiment Analysis

Analyze the sentiment polarity (positive/negative) of Spanish movie reviews.

91.01% accuracy

Customer Feedback Analysis

Product Review Sentiment Analysis

Analyze the sentiment polarity of Spanish product reviews.

🚀 beto_sentiment_analysis_es Model

A finetuned model for sentiment analysis in Spanish, leveraging the BETO base model and trained on Spanish movie reviews.

🚀 Quick Start

This model is designed for sentiment analysis of Spanish text, especially movie reviews. It's fine - tuned based on the BETO model and trained using Amazon SageMaker and Hugging Face's Deep Learning container.

✨ Features

Based on BETO: Utilizes the BETO model, a BERT - base model pre - trained on a Spanish corpus, trained with the Whole Word Masking technique.
Multilingual Dataset: Trained on a balanced dataset of about 50,000 Spanish movie reviews, with reviews and labels available in both English and Spanish.
High Performance: Achieves high scores in accuracy, F1 score, precision, and recall on the test dataset.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("edumunozsala/beto_sentiment_analysis_es")
model = AutoModelForSequenceClassification.from_pretrained("edumunozsala/beto_sentiment_analysis_es")

text ="Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"

input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
outputs = model(input_ids)
output = outputs.logits.argmax(1)

📚 Documentation

Model Information

Property	Details
Model Type	beto_sentiment_analysis_es
Training Data	IMDbreviews_es (about 50,000 Spanish movie reviews)

Model Index

Name: beto_sentiment_analysis_es
Results:
- Task:
  - Name: Sentiment Analysis
  - Type: sentiment - analysis
- Dataset:
  - Name: "IMDb Reviews in Spanish"
  - Type: IMDbreviews_es
- Metrics:
  - Name: Accuracy, Type: accuracy, Value: 0.9101333333333333
  - Name: F1 Score, Type: f1, Value: 0.9088450094671354
  - Name: Precision, Type: precision, Value: 0.9105691056910569
  - Name: Recall, Type: recall, Value: 0.9071274298056156

BETO Citation

Spanish Pre - Trained BERT Model and Evaluation Data

@inproceedings{CaneteCFP2020,
  title={Spanish Pre-Trained BERT Model and Evaluation Data},
  author={Cañete, José and Chaperon, Gabriel and Fuentes, Rodrigo and Ho, Jou - Hui and Kang, Hojin and Pérez, Jorge},
  booktitle={PML4DC at ICLR 2020},
  year={2020}
}

Dataset

The dataset consists of about 50,000 Spanish movie reviews. It is balanced and includes reviews and labels in both English and Spanish.

Train dataset: 42,500
Validation dataset: 3,750
Test dataset: 3,750

Intended Uses & Limitations

This model is intended for sentiment analysis of Spanish text, especially movie reviews, but can also be applied to other types of reviews.

Hyperparameters

{
    "epochs": "4",
    "train_batch_size": "32",    
    "eval_batch_size": "8",
    "fp16": "true",
    "learning_rate": "3e-05",
    "model_name": "\"dccuchile/bert-base-spanish-wwm-uncased\"",
    "sagemaker_container_log_level": "20",
    "sagemaker_program": "\"train.py\""
}

Evaluation Results

Accuracy: 0.9101333333333333
F1 Score: 0.9088450094671354
Precision: 0.9105691056910569
Recall: 0.9071274298056156

📄 License

This model is released under the Apache 2.0 license.

Created by Eduardo Muñoz/@edumunozsala

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご