roberta-base-10M-1 Open-source Model - Supports Pretraining of Different-scale Datasets in Multiple Specifications

Roberta Base 10M 1

Developed by nyu-mll

RoBERTa series models pretrained on datasets of varying scales (1M-1B tokens), including BASE and MED-SMALL specifications

Downloads 13

Release Time : 3/2/2022

Model Overview

A variant of RoBERTa pretrained on smaller-scale datasets for studying the impact of data scale on language model performance

Multi-scale pretraining

Provides pretrained models from 1M to 1B data scales for studying data scale effects

Two model specifications

Includes standard BASE architecture (125M) and streamlined MED-SMALL architecture (45M)

Rigorous selection

For each scale, the top 3 models with lowest validation perplexity from multiple runs are released

Text representation learning

Downstream task fine-tuning

Language model pretraining research

Language model research

Data scale impact study

Investigating the impact of different pretraining data scales on language model performance

Provides comparative models at four orders of magnitude: 1M/10M/100M/1B

Educational applications

Lightweight language model teaching

Using small-scale models for NLP teaching demonstrations

MED-SMALL specification with only 45M parameters is suitable for teaching environments

Model Name	Training Size	Model Size	Max Steps	Batch Size	Validation Perplexity
roberta-base-1B-1	1B	BASE	100K	512	3.93
roberta-base-1B-2	1B	BASE	31K	1024	4.25
roberta-base-1B-3	1B	BASE	31K	4096	3.84
roberta-base-100M-1	100M	BASE	100K	512	4.99
roberta-base-100M-2	100M	BASE	31K	1024	4.61
roberta-base-100M-3	100M	BASE	31K	512	5.02
roberta-base-10M-1	10M	BASE	10K	1024	11.31
roberta-base-10M-2	10M	BASE	10K	512	10.78
roberta-base-10M-3	10M	BASE	31K	512	11.58
roberta-med-small-1M-1	1M	MED - SMALL	100K	512	153.38
roberta-med-small-1M-2	1M	MED - SMALL	10K	512	134.18
roberta-med-small-1M-3	1M	MED - SMALL	31K	512	139.39

Model Size	L	AH	HS	FFN	P
BASE	12	12	768	3072	125M
MED - SMALL	6	8	512	2048	45M

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base