Roberta Base 10M 1
RoBERTa series models pretrained on datasets of varying scales (1M-1B tokens), including BASE and MED-SMALL specifications
Downloads 13
Release Time : 3/2/2022
Model Overview
A variant of RoBERTa pretrained on smaller-scale datasets for studying the impact of data scale on language model performance
Model Features
Multi-scale pretraining
Provides pretrained models from 1M to 1B data scales for studying data scale effects
Two model specifications
Includes standard BASE architecture (125M) and streamlined MED-SMALL architecture (45M)
Rigorous selection
For each scale, the top 3 models with lowest validation perplexity from multiple runs are released
Model Capabilities
Text representation learning
Downstream task fine-tuning
Language model pretraining research
Use Cases
Language model research
Data scale impact study
Investigating the impact of different pretraining data scales on language model performance
Provides comparative models at four orders of magnitude: 1M/10M/100M/1B
Educational applications
Lightweight language model teaching
Using small-scale models for NLP teaching demonstrations
MED-SMALL specification with only 45M parameters is suitable for teaching environments
Featured Recommended AI Models