đ RobBERTje
RobBERTje is a collection of distilled Dutch BERT - based models, offering multiple options with different sizes and training settings for various use - cases.
đ Quick Start
RobBERTje provides a range of distilled models based on RobBERT. You can choose from multiple models with different sizes and training settings according to your specific use - case. We are constantly working on releasing models with better performance. Keep an eye on the repository for updates.
đ Documentation
News
- February 21, 2022: Our paper about RobBERTje has been published in volume 11 of CLIN journal!
- July 2, 2021: Publicly released 4 RobBERTje models.
- May 12, 2021: RobBERTje was accepted at CLIN31 for an oral presentation!
The models
Property |
Details |
Model Type |
Non - shuffled, Shuffled, Merged (p = 0.5), BORT |
Description |
|
- Non - shuffled: Trained on the non - shuffled variant of the oscar corpus, without any operations to preserve this order during training and distillation. - Shuffled: Trained on the publicly available and shuffled OSCAR corpus. - Merged (p = 0.5): Same as the non - shuffled variant, but sequential sentences of the same document are merged with a probability of 50%. - BORT: A smaller version with 8 attention heads instead of 12 and 4 layers instead of 6 (and 12 for RobBERT). |
|
Parameters |
Non - shuffled: 74 M; Shuffled: 74 M; Merged (p = 0.5): 74 M; BORT: 46 M |
Training size |
All models: 1 GB |
Huggingface id |
|
- Non - shuffled: [DTAI - KULeuven/robbertje - 1 - gb - non - shuffled](https://huggingface.co/DTAI - KULeuven/robbertje - 1 - gb - non - shuffled) - Shuffled: this model - Merged (p = 0.5): [DTAI - KULeuven/robbertje - 1 - gb - merged](https://huggingface.co/DTAI - KULeuven/robbertje - 1 - gb - merged) - BORT: [DTAI - KULeuven/robbertje - 1 - gb - bort](https://huggingface.co/DTAI - KULeuven/robbertje - 1 - gb - bort) |
|
Results
Intrinsic results
We calculated the pseudo perplexity (PPPL) from cite, which is a built - in metric in our distillation library. This metric gives an indication of how well the model captures the input distribution.
Model |
PPPL |
RobBERT (teacher) |
7.76 |
Non - shuffled |
12.95 |
Shuffled |
18.74 |
Merged (p = 0.5) |
17.10 |
BORT |
26.44 |
Extrinsic results
We also evaluated our models on several downstream tasks, just like the teacher model RobBERT. Since that evaluation, a Dutch NLI task named SICK - NL was also released and we evaluated our models with it as well.
Model |
DBRD |
DIE - DAT |
NER |
POS |
SICK - NL |
RobBERT (teacher) |
94.4 |
99.2 |
89.1 |
96.4 |
84.2 |
Non - shuffled |
90.2 |
98.4 |
82.9 |
95.5 |
83.4 |
Shuffled |
92.5 |
98.2 |
82.7 |
95.6 |
83.4 |
Merged (p = 0.5) |
92.9 |
96.5 |
81.8 |
95.2 |
82.8 |
BORT |
89.6 |
92.2 |
79.7 |
94.3 |
81.0 |
đ License
This project is licensed under the MIT license.