Bertin Base Gaussian
This is a Spanish fill-mask model based on the RoBERTa-base architecture, trained from scratch.
Downloads 16
Release Time : 3/2/2022
Model Overview
This model is specifically designed for Spanish, used for fill-mask tasks, capable of predicting missing words or phrases in text.
Model Features
High-Quality Training Data
The training dataset is sampled from mc4 documents, totaling approximately 50 million samples, filtered using a Gaussian function biased towards average perplexity to ensure data quality.
Trained from Scratch
The model is not fine-tuned from a pre-trained model but trained from scratch, specifically optimized for Spanish.
Community Support
This project is part of the Jax/Flax Community Week event, organized by HuggingFace, with TPU computing resources provided by Google.
Model Capabilities
Spanish Text Understanding
Fill-Mask Prediction
Use Cases
Text Completion
Sentence Completion
Predict missing words or phrases in sentences, such as in the example 'Fui a la librería a comprar un <mask>.'
Language Education
Spanish Learning Aid
Can be used in Spanish learning tools to help learners understand sentence structure and vocabulary usage.
Featured Recommended AI Models