đ Fietje 2 Chat
Fietje 2 Chat is an open and efficient large language model (LLM) tailored for Dutch text generation. It's a DPO - tuned continuation of the instruct version, offering high - performance with relatively fewer parameters.
⨠Features
- Open and Efficient: Based on [microsoft/phi - 2](https://huggingface.co/microsoft/phi - 2), adapted for Dutch with 2.7 billion parameters, performing almost as well as larger Dutch LLMs.
- Multiple Versions: Comes in base, instruct, chat, and GGUF chat versions, providing flexibility for different use - cases.
- Dutch - Specific: Trained on Dutch datasets, making it well - suited for Dutch text generation.
đĻ Model Information
đ Quick Start
You can interact with Fietje 2 Chat on the [Hugging Face Space](https://huggingface.co/spaces/BramVanroy/fietje - 2b).
Fietje 2 Chat
An open and efficient LLM for Dutch
đąââī¸ Base version -
đ¤ Instruct version -
đŦ Chat version (this one) -
đ GGUF of Chat
Chat with Fietje here!
đ Documentation
A thorough description of the creation and evaluation of Fietje as well as usage examples are available in this Github repository.
đ§ Technical Details
Training and evaluation data
Fietje 2 Chat was finetuned from [the instruct model](https://huggingface.co/BramVanroy/fietje - 2 - instruct) on the following datasets. The number of training samples per dataset is given in brackets, with a total of 18,653 samples.
A lot of different learning rates, beta, and batch sizes were investigated to find a converging combination. You can find them all in [the W&B runs](https://wandb.ai/bramvanroy/dpo - fietje - 2b).
Training procedure
Thanks to the Flemish Supercomputer Center (VSC) for providing the computational power. Training a single run took around nine hours on one A100 80GB, accounting for job waiting time.
Training was done with the [alignment - handbook](https://github.com/huggingface/alignment - handbook), using DeepSpeed as a back - end. Exact training recipes and SLURM script are given in the Github repository.
Training hyperparameters
The following hyperparameters were used during training:
- beta: 0.2
- learning_rate: 2e - 06
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi - GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.98) and epsilon = 1e - 07
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Rewards/chosen |
Rewards/rejected |
Rewards/accuracies |
Rewards/margins |
Logps/rejected |
Logps/chosen |
Logits/rejected |
Logits/chosen |
0.2515 |
1.0 |
1166 |
0.2842 |
-1.1549 |
-3.6363 |
0.8867 |
2.4815 |
-657.6813 |
-451.3364 |
-1.2868 |
-1.3528 |
Framework versions
- Transformers 4.39.1
- Pytorch 2.1.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
đ License
This project is licensed under the MIT license.
đ Citation
If you use Fietje or the CulturaX + Wikipedia filtered subset in your work, please cite the following paper:
@misc{vanroy2024fietjeopenefficientllm,
title={Fietje: An open, efficient LLM for Dutch},
author={Bram Vanroy},
year={2024},
eprint={2412.15450},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.15450},
}
â ī¸ Important Note
The same limitations as [phi - 2](https://huggingface.co/microsoft/phi - 2#limitations - of - phi - 2), and LLMs in general, apply here. LLMs may hallucinate and make mistakes, so use at your own risk!