Bertin Roberta Large Spanish
BERTIN is a series of Spanish language models based on BERT. This model follows the RoBERTa-large architecture, trained from scratch using the Flax framework, with data sourced from the Spanish portion of the mC4 corpus.
Downloads 26
Release Time : 3/2/2022
Model Overview
This is a Spanish pre-trained model based on the RoBERTa-large architecture, specifically designed for masked language modeling tasks, suitable for Spanish natural language processing applications.
Model Features
Trained from scratch
Trained from scratch using the Flax framework, not fine-tuned from existing models.
Large-scale training data
Based on the Spanish portion of the mC4 corpus, containing approximately 416 million text records and 235 billion words.
Community-driven development
Developed as part of the HuggingFace Community Week project, with TPU resources provided by Google.
Model Capabilities
Spanish text understanding
Masked token prediction
Contextual semantic analysis
Use Cases
Natural Language Processing
Text completion
Predicting masked words in sentences.
Example: 'I went to the bookstore and bought a <mask>.' could be predicted as 'book' or other suitable words.
Semantic analysis
Understanding contextual meaning in Spanish text.
Featured Recommended AI Models