Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Bielik-7B-Instruct-v0.1
Bielik-7B-Instruct-v0.1 is an instruct fine-tuned model based on Bielik-7B-v0.1. It is developed to understand and process the Polish language accurately, enabling high-precision linguistic tasks.
🚀 Quick Start
The Bielik-7B-Instruct-v0.1 is an instruct fine-tuned version of the Bielik-7B-v0.1. It's the result of a unique collaboration between the SpeakLeash project and ACK Cyfronet AGH. Trained on Polish text corpora processed by the SpeakLeash team, using the PLGrid environment and computational grant PLG/2024/016951 on Athena and Helios supercomputers, this model can understand and process the Polish language effectively.
We have prepared quantized versions of the model as well as MLX format.
🎥 Demo: https://huggingface.co/spaces/speakleash/Bielik-7B-Instruct-v0.1
🗣️ Chat Arena*: https://arena.speakleash.org.pl/
*Chat Arena is a platform for testing and comparing different AI language models, allowing users to evaluate their performance and quality.
✨ Features
- Exceptional ability to understand and process the Polish language.
- Fine-tuned using a combination of Polish and English instruction datasets.
- Utilizes advanced training strategies like weighted tokens level loss and adaptive learning rate.
📦 Installation
No installation steps are provided in the original README.
💻 Usage Examples
Basic Usage
In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST]
and [/INST]
tokens. The very first instruction should start with the beginning of a sentence token. The generated completion will be finished by the end-of-sentence token.
E.g.
prompt = "<s>[INST] Jakie mamy pory roku? [/INST]"
completion = "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.</s>"
Advanced Usage
This format is available as a chat template via the apply_chat_template()
method:
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model_name = "speakleash/Bielik-7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
messages = [
{"role": "system", "content": "Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim."},
{"role": "user", "content": "Jakie mamy pory roku w Polsce?"},
{"role": "assistant", "content": "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima."},
{"role": "user", "content": "Która jest najcieplejsza?"}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = input_ids.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
If for some reason you are unable to use tokenizer.apply_chat_template
, the following code will enable you to generate a correct prompt:
def chat_template(message, history, system_prompt):
prompt_builder = ["<s>[INST] "]
if system_prompt:
prompt_builder.append(f"<<SYS>>\n{system_prompt}\n<</SYS>>\n\n")
for human, assistant in history:
prompt_builder.append(f"{human} [/INST] {assistant}</s>[INST] ")
prompt_builder.append(f"{message} [/INST]")
return ''.join(prompt_builder)
system_prompt = "Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim."
history = [
("Jakie mamy pory roku w Polsce?", "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.")
]
message = "Która jest najcieplejsza?"
prompt = chat_template(message, history, system_prompt)
📚 Documentation
Model
The SpeakLeash team is working on their own set of instructions in Polish, which is continuously being expanded and refined by annotators. A portion of these instructions, which had been manually verified and corrected, has been utilized for training purposes. Moreover, due to the limited availability of high-quality instructions in Polish, publicly accessible collections of instructions in English were used - OpenHermes-2.5 and orca-math-word-problems-200k, which accounted for half of the instructions used in training. The instructions varied in quality, leading to a deterioration in model’s performance. To counteract this while still allowing ourselves to utilize forementioned datasets,several improvements were introduced:
- Weighted tokens level loss - a strategy inspired by offline reinforcement learning and C-RLFT
- Adaptive learning rate inspired by the study on Learning Rates as a Function of Batch Size
- Masked user instructions
Bielik-7B-Instruct-v0.1 has been trained with the use of an original open source framework called ALLaMo implemented by Krzysztof Ociepa. This framework allows users to train language models with architecture similar to LLaMA and Mistral in fast and efficient way.
Model description:
Property | Details |
---|---|
Developed by | SpeakLeash |
Language | Polish |
Model Type | causal decoder-only |
Finetuned from | Bielik-7B-v0.1 |
License | CC BY NC 4.0 (non-commercial use) |
Model ref | speakleash:e38140bea0d48f1218540800bbc67e89 |
Training
Training hyperparameters:
Hyperparameter | Value |
---|---|
Context length | 4096 |
Micro Batch Size | 1 |
Batch Size | up to 4194304 |
Learning Rate (cosine, adaptive) | 7e-6 -> 6e-7 |
Warmup Iterations | 50 |
All Iterations | 55440 |
Optimizer | AdamW |
β1, β2 | 0.9, 0.95 |
Adam_eps | 1e−8 |
Weight Decay | 0.05 |
Grad Clip | 1.0 |
Precision | bfloat16 (mixed) |
Quant and MLX versions:
We know that some people want to explore smaller models or don't have the resources to run a full model. Therefore, we have prepared quantized versions of the Bielik-7B-Instruct-v0.1 model. We are also mindful of Apple Silicon.
Quantized versions (for non-GPU / weaker GPU):
- https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-GGUF
- https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-GPTQ
- https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-AWQ
- https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-EXL2
- https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-3bit-HQQ
For Apple Silicon:
- https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-MLX
Evaluation
Models have been evaluated on Open PL LLM Leaderboard 5-shot. The benchmark evaluates models in NLP tasks like sentiment analysis, categorization, text classification but does not test chatting skills. Here are presented:
- Average - average score among all tasks normalized by baseline scores
- Reranking - reranking task, commonly used in RAG
- Reader (Generator) - open book question answering task, commonly used in RAG
- Perplexity (lower is better) - as a bonus, does not correlate with other scores and should not be used for model comparison
As of April 3, 2024, the following table showcases the current scores of pretrained and continuously pretrained models according to the Open PL LLM Leaderboard, evaluated in a 5-shot setting:
Average | RAG Reranking | RAG Reader | Perplexity | |
---|---|---|---|---|
7B parameters models: | ||||
Baseline (majority class) | 0.00 | 53.36 | - | - |
Voicelab/trurl-2-7b | 18.85 | 60.67 | 77.19 | 1098.88 |
meta-llama/Llama-2-7b-chat-hf | 21.04 | 54.65 | 72.93 | 4018.74 |
mistralai/Mistral-7B-Instruct-v0.1 | 26.42 | 56.35 | 73.68 | 6909.94 |
szymonrucinski/Curie-7B-v1 | 26.72 | 55.58 | 85.19 | 389.17 |
HuggingFaceH4/zephyr-7b-beta | 33.15 | 71.65 | 71.27 | 3613.14 |
HuggingFaceH4/zephyr-7b-alpha | 33.97 | 71.47 | 73.35 | 4464.45 |
internlm/internlm2-chat-7b-sft | 36.97 | 73.22 | 69.96 | 4269.63 |
internlm/internlm2-chat-7b | 37.64 | 72.29 | 71.17 | 3892.50 |
Bielik-7B-Instruct-v0.1 | 39.28 | 61.89 | 86.00 | 277.92 |
mistralai/Mistral-7B-Instruct-v0.2 | 40.29 | 72.58 | 79.39 | 2088.08 |
teknium/OpenHermes-2.5-Mistral-7B | 42.64 | 70.63 | 80.25 | 1463.00 |
openchat/openchat-3.5-1210 | 44.17 | 71.76 | 82.15 | 1923.83 |
speakleash/mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e1_70c70cc6 (experimental) | 45.44 | 71.27 | 91.50 | 279.24 |
Nexusflow/Starling-LM-7B-beta | 45.69 | 74.58 | 81.22 | 1161.54 |
openchat/openchat-3.5-0106 | 47.32 | 74.71 | 83.60 | 1106.56 |
berkeley-nest/Starling-LM-7B-alpha | 47.46 | 75.73 | 82.86 | 1438.04 |
Models with different sizes: | ||||
Azurro/APT3-1B-Instruct-v1 (1B) | -13.80 | 52.11 | 12.23 | 739.09 |
Voicelab/trurl-2-13b-academic (13B) | 29.45 | 68.19 | 79.88 | 733.91 |
upstage/SOLAR-10.7B-Instruct-v1.0 (10.7B) | 46.07 | 76.93 | 82.86 | 789.58 |
7B parameters pretrained and continously pretrained models: | ||||
OPI-PG/Qra-7b | 11.13 |
📄 License
The model is licensed under CC BY NC 4.0 (non-commercial use).

