đ RobBERTje: A collection of distilled Dutch BERT-based models
RobBERTje is a collection of distilled models based on RobBERT. It offers multiple models with different sizes and training settings for various use - cases. The team is also constantly working on releasing better - performing models.
đ Quick Start
This README provides a detailed introduction to RobBERTje, including its news, model descriptions, and performance results. You can choose the appropriate model according to your needs and refer to the performance data for evaluation.
⨠Features
- A collection of distilled Dutch BERT - based models.
- Multiple models with different sizes and training settings.
- Continuous improvement and release of better - performing models.
đ Documentation
đ About RobBERTje
RobBERTje is a collection of distilled models based on RobBERT. There are multiple models with different sizes and different training settings, which you can choose for your use - case.
We are also continuously working on releasing better - performing models, so watch the repository for updates.
đī¸ News
- February 21, 2022: Our paper about RobBERTje has been published in volume 11 of CLIN journal!
- July 2, 2021: Publicly released 4 RobBERTje models.
- May 12, 2021: RobBERTje was accepted at CLIN31 for an oral presentation!
đ§ The models
Model |
Description |
Parameters |
Training size |
Huggingface id |
Non - shuffled |
Trained on the non - shuffled variant of the oscar corpus, without any operations to preserve this order during training and distillation. |
74 M |
1 GB |
[DTAI - KULeuven/robbertje - 1 - gb - non - shuffled](https://huggingface.co/DTAI - KULeuven/robbertje - 1 - gb - non - shuffled) |
Shuffled |
Trained on the publicly available and shuffled OSCAR corpus. |
74 M |
1 GB |
[DTAI - KULeuven/robbertje - 1 - gb - shuffled](https://huggingface.co/DTAI - KULeuven/robbertje - 1 - gb - shuffled) |
Merged (p = 0.5) |
Same as the non - shuffled variant, but sequential sentences of the same document are merged with a probability of 50%. |
74 M |
1 GB |
[DTAI - KULeuven/robbertje - 1 - gb - merged](https://huggingface.co/DTAI - KULeuven/robbertje - 1 - gb - merged) |
BORT |
A smaller version with 8 attention heads instead of 12 and 4 layers instead of 6 (and 12 for RobBERT). |
46 M |
1 GB |
this model |
đ Results
đ Intrinsic results
We calculated the pseudo perplexity (PPPL) from cite, which is a built - in metric in our distillation library. This metric gives an indication of how well the model captures the input distribution.
Model |
PPPL |
RobBERT (teacher) |
7.76 |
Non - shuffled |
12.95 |
Shuffled |
18.74 |
Merged (p = 0.5) |
17.10 |
BORT |
26.44 |
đ Extrinsic results
We also evaluated our models on several downstream tasks, just like the teacher model RobBERT. Since that evaluation, a Dutch NLI task named SICK - NL was also released and we evaluated our models with it as well.
Model |
DBRD |
DIE - DAT |
NER |
POS |
SICK - NL |
RobBERT (teacher) |
94.4 |
99.2 |
89.1 |
96.4 |
84.2 |
Non - shuffled |
90.2 |
98.4 |
82.9 |
95.5 |
83.4 |
Shuffled |
92.5 |
98.2 |
82.7 |
95.6 |
83.4 |
Merged (p = 0.5) |
92.9 |
96.5 |
81.8 |
95.2 |
82.8 |
BORT |
89.6 |
92.2 |
79.7 |
94.3 |
81.0 |
đ License
This project is licensed under the MIT license.