🚀 DistilCamemBERT
We present DistilCamemBERT, a distilled version of the well - known CamemBERT, a French RoBERTa model. The goal of distillation is to significantly reduce the model's complexity while maintaining its performance. The proof of concept is detailed in the DistilBERT paper, and the training code is inspired by DistilBERT.
🚀 Quick Start
DistilCamemBERT is a distilled version of the French RoBERTa model CamemBERT. It aims to reduce model complexity while preserving performance.
✨ Features
- Model Distillation: Drastically reduces the complexity of the model while preserving the performances.
- Custom Loss Function: The training loss function is a combination of DistilLoss, CosineLoss, and MLMLoss.
- Same Dataset: Trained on the same dataset (OSCAR) as the original CamemBERT to limit bias.
🔧 Technical Details
Loss function
The training of the distilled model (student model) is designed to be as close as possible to the original model (teacher model). The loss function consists of 3 parts:
- DistilLoss: A distillation loss that measures the similarity between the output probabilities of the student and teacher models using cross - entropy loss on the MLM task.
- CosineLoss: A cosine embedding loss applied to the last hidden layers of the student and teacher models to ensure collinearity.
- MLMLoss: A Masked Language Modeling (MLM) task loss to train the student model on the original task of the teacher model.
The final loss function is a combination of these three losses with the following weighting:
$$Loss = 0.5 \times DistilLoss + 0.3 \times CosineLoss + 0.2 \times MLMLoss$$
Dataset
To limit the bias between the student and teacher models, the dataset used for DistilCamemBERT training is the same as that of camembert - base: OSCAR. The French part of this dataset occupies approximately 140 GB on a hard drive.
Training
The model was pre - trained on an nVidia Titan RTX for 18 days.
📚 Documentation
Evaluation results
Dataset name |
f1 - score |
FLUE CLS |
83% |
FLUE PAWS - X |
77% |
FLUE XNLI |
77% |
[wikiner_fr](https://huggingface.co/datasets/Jean - Baptiste/wikiner_fr) NER |
98% |
How to use DistilCamemBERT
Basic Usage
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("cmarkea/distilcamembert-base")
model = AutoModel.from_pretrained("cmarkea/distilcamembert-base")
model.eval()
...
Advanced Usage
from transformers import pipeline
model_fill_mask = pipeline("fill-mask", model="cmarkea/distilcamembert-base", tokenizer="cmarkea/distilcamembert-base")
results = model_fill_mask("Le camembert est <mask> :)")
results
[{'sequence': '<s> Le camembert est délicieux :)</s>', 'score': 0.3878222405910492, 'token': 7200},
{'sequence': '<s> Le camembert est excellent :)</s>', 'score': 0.06469205021858215, 'token': 2183},
{'sequence': '<s> Le camembert est parfait :)</s>', 'score': 0.04534877464175224, 'token': 1654},
{'sequence': '<s> Le camembert est succulent :)</s>', 'score': 0.04128391295671463, 'token': 26202},
{'sequence': '<s> Le camembert est magnifique :)</s>', 'score': 0.02425697259604931, 'token': 1509}]
📄 License
This project is licensed under the MIT license.
📖 Citation
@inproceedings{delestre:hal-03674695,
TITLE = {{DistilCamemBERT : une distillation du mod{\`e}le fran{\c c}ais CamemBERT}},
AUTHOR = {Delestre, Cyrile and Amar, Abibatou},
URL = {https://hal.archives-ouvertes.fr/hal-03674695},
BOOKTITLE = {{CAp (Conf{\'e}rence sur l'Apprentissage automatique)}},
ADDRESS = {Vannes, France},
YEAR = {2022},
MONTH = Jul,
KEYWORDS = {NLP ; Transformers ; CamemBERT ; Distillation},
PDF = {https://hal.archives-ouvertes.fr/hal-03674695/file/cap2022.pdf},
HAL_ID = {hal-03674695},
HAL_VERSION = {v1},
}
📦 Additional Information
Property |
Details |
Model Type |
Distilled French RoBERTa model |
Training Data |
OSCAR |