Roberta Base 100M 3
RoBERTa variants pre-trained on datasets ranging from 1M to 1B tokens, including BASE and MED-SMALL specifications, suitable for natural language processing tasks in resource-limited scenarios
Downloads 18
Release Time : 3/2/2022
Model Overview
RoBERTa models pre-trained on datasets of varying scales (1M/10M/100M/1B tokens), optimized for small-data scenarios by adjusting model specifications and training parameters
Model Features
Small-data optimization
Specifically optimized for small-scale data (1M-1B tokens), making it more suitable for data-constrained scenarios compared to the original RoBERTa
Optional specifications
Offers two parameter scales: BASE (125M) and MED-SMALL (45M), balancing performance and efficiency
Rigorous validation
Publishes the top 3 models with the lowest validation perplexity for each data scale to ensure quality
Model Capabilities
Text representation learning
Downstream task fine-tuning
Masked word prediction
Use Cases
Education
Small-scale data fine-tuning
Serves as a pre-training base for educational text classification tasks with limited annotated data
Research
Pre-training strategy research
Investigates the impact of different data scales on pre-trained model performance
Featured Recommended AI Models