🚀 Bielik-4.5B-v3-Instruct
Bielik-4.5B-v3-Instruct is a generative text model with 4.6 billion parameters. It's an instruct fine - tuned version of Bielik-4.5B-v3. This model results from the collaboration between the open - science/open - source project SpeakLeash and the High Performance Computing (HPC) center: ACK Cyfronet AGH. Developed and trained on Polish text corpora processed by the SpeakLeash team, it leverages Polish large - scale computing infrastructure in the PLGrid environment, specifically at the HPC center ACK Cyfronet AGH. Supported by computational grants PLG/2024/017214 and PLG/2025/018338 on the Athena and Helios supercomputers, the model can understand and process the Polish language well, offering accurate responses and performing various linguistic tasks precisely.
📚 Technical report: https://arxiv.org/abs/2505.02550
✨ Features
- High - Quality Training: Trained on over 19 million instructions with more than 12 billion tokens, including manually verified and synthetic instructions.
- Advanced Alignment Techniques: Aligned with user preferences using the DPO - Positive method, which introduces multi - turn conversations.
- Polish Language Proficiency: Exceptionally capable of understanding and processing the Polish language.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model_name = "speakleash/Bielik-4.5B-v3-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
messages = [
{"role": "system", "content": "Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim."},
{"role": "user", "content": "Jakie mamy pory roku w Polsce?"},
{"role": "assistant", "content": "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima."},
{"role": "user", "content": "Która jest najcieplejsza?"}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = input_ids.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
Advanced Usage
The basic usage example also demonstrates how to handle multi - turn conversations, which is an advanced feature introduced in the DPO - Positive method.
📚 Documentation
Model
The SpeakLeash team is continuously expanding and refining a set of Polish instructions. A manually verified portion of these instructions was used for training, along with synthetic instructions generated by Bielik 11B v2.3. The training dataset had over 19 million instructions with more than 12 billion tokens.
To align the model with user preferences, multiple techniques were tested, and the DPO - Positive method was chosen. It uses both generated and manually corrected examples scored by a metamodel. A dataset of over 111,000 examples of different lengths was filtered and evaluated by the reward model.
Bielik instruct models are trained using the open - source ALLaMo framework implemented by Krzysztof Ociepa.
Model description:
Chat template
Bielik-4.5B-v3-Instruct uses ChatML as the prompt format.
E.g.
prompt = "<s><|im_start|> user\nJakie mamy pory roku?<|im_end|> \n<|im_start|> assistant\n"
completion = "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.<|im_end|> \n"
This format is available as a chat template via the apply_chat_template()
method.
🔧 Technical Details
The model is developed through a unique collaboration between SpeakLeash and ACK Cyfronet AGH. It is trained on Polish text corpora processed by the SpeakLeash team, using the Polish large - scale computing infrastructure in the PLGrid environment. The training is supported by computational grants PLG/2024/017214 and PLG/2025/018338 on the Athena and Helios supercomputers.
For alignment, the DPO - Positive method is used. A dataset of over 111,000 examples is filtered and evaluated by the reward model to select instructions with the right level of difference between chosen and rejected responses. The novelty in DPO - P is the introduction of multi - turn conversations.
📄 License
The model is licensed under Apache 2.0 and Terms of Use.
Limitations and Biases
Bielik-4.5B-v3-Instruct is a quick demonstration that can be fine - tuned easily. It lacks moderation mechanisms. The model may produce factually incorrect, lewd, false, biased, or offensive outputs, as it was trained on various public datasets.
Citation
Please cite this model using the following format:
@misc{ociepa2025bielikv3smalltechnical,
title={Bielik v3 Small: Technical Report},
author={Krzysztof Ociepa and Łukasz Flis and Remigiusz Kinas and Krzysztof Wróbel and Adrian Gwoździej},
year={2025},
eprint={2505.02550},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.02550},
}
@misc{Bielik45Bv3i,
title = {Bielik-4.5B-v3-Instruct model card},
author = {Ociepa, Krzysztof and Flis, Łukasz and Kinas, Remigiusz and Gwoździej, Adrian and Wróbel, Krzysztof and {SpeakLeash Team} and {Cyfronet Team}},
year = {2025},
url = {https://huggingface.co/speakleash/Bielik-4.5B-v3-Instruct},
note = {Accessed: 2025-05-06}, % change this date
urldate = {2025-05-06} % change this date
}
Responsible for training the model
- Krzysztof OciepaSpeakLeash: Team leadership, conceptualizing, data preparation, process optimization, and oversight of training.
- Łukasz FlisCyfronet AGH: Coordinating and supervising the training.
- Remigiusz KinasSpeakLeash: Conceptualizing, coordinating RL trainings, data preparation, benchmarking, and quantizations.
- Adrian GwoździejSpeakLeash: Data preparation and ensuring data quality.
- Krzysztof WróbelSpeakLeash: Benchmarks.
Many individuals from the SpeakLeash team and ACK Cyfronet AGH team also contributed to the model creation.
We thank the Polish high - performance computing infrastructure PLGrid (HPC Center: ACK Cyfronet AGH) for providing computer facilities and support through computational grants PLG/2024/017214 and PLG/2025/018338.
Contact Us
If you have any questions or suggestions, use the discussion tab. To contact us directly, join our Discord SpeakLeash.