Bielik-7B-Instruct-v0.1 Open-Source Polish Large Model - Precise Handling of Polish Language Understanding Tasks

Bielik 7B Instruct V0.1

Developed by speakleash

Bielik-7B-Instruct-v0.1 is a Polish large language model fine-tuned for instructions, based on Bielik-7B-v0.1, developed by the SpeakLeash team in collaboration with ACK Cyfronet AGH, specializing in Polish language understanding and processing tasks.

Large Language Model

Transformers

Other#Polish instruction fine-tuning #Multi-turn dialogue optimization #High-precision text generation

Downloads 656

Release Time : 3/30/2024

Model Overview

This model is the result of an open-source research project by SpeakLeash in collaboration with the high-performance computing center ACK Cyfronet AGH. Trained on a carefully selected corpus of Polish texts, it demonstrates exceptional Polish language understanding and processing capabilities.

Model Features

Polish language optimization

Specially trained and optimized for Polish, excelling in Polish language tasks

Instruction fine-tuning

Fine-tuned with extensive Polish and English instruction data for better understanding and execution of user instructions

High-performance computing training

Trained using supercomputing resources from the Polish PLGrid environment to ensure model quality

Model Capabilities

Polish text generation

Question answering

Instruction following

Multi-turn dialogue

Use Cases

Education

Polish language learning assistant

Helps students learn and practice Polish

Customer service

Polish customer service bot

Provides automated customer support for Polish-speaking users

🚀 Bielik-7B-Instruct-v0.1

Bielik-7B-Instruct-v0.1 is an instruct fine-tuned model based on Bielik-7B-v0.1. It is developed to understand and process the Polish language accurately, enabling high-precision linguistic tasks.

🚀 Quick Start

The Bielik-7B-Instruct-v0.1 is an instruct fine-tuned version of the Bielik-7B-v0.1. It's the result of a unique collaboration between the SpeakLeash project and ACK Cyfronet AGH. Trained on Polish text corpora processed by the SpeakLeash team, using the PLGrid environment and computational grant PLG/2024/016951 on Athena and Helios supercomputers, this model can understand and process the Polish language effectively.

We have prepared quantized versions of the model as well as MLX format.

🎥 Demo: https://huggingface.co/spaces/speakleash/Bielik-7B-Instruct-v0.1

🗣️ Chat Arena*: https://arena.speakleash.org.pl/

*Chat Arena is a platform for testing and comparing different AI language models, allowing users to evaluate their performance and quality.

✨ Features

Exceptional ability to understand and process the Polish language.
Fine-tuned using a combination of Polish and English instruction datasets.
Utilizes advanced training strategies like weighted tokens level loss and adaptive learning rate.

📦 Installation

No installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [/INST] tokens. The very first instruction should start with the beginning of a sentence token. The generated completion will be finished by the end-of-sentence token.

E.g.

prompt = "<s>[INST] Jakie mamy pory roku? [/INST]"
completion = "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.</s>"

Advanced Usage

This format is available as a chat template via the apply_chat_template() method:

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model_name = "speakleash/Bielik-7B-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

messages = [
    {"role": "system", "content": "Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim."},
    {"role": "user", "content": "Jakie mamy pory roku w Polsce?"},
    {"role": "assistant", "content": "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima."},
    {"role": "user", "content": "Która jest najcieplejsza?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = input_ids.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

If for some reason you are unable to use tokenizer.apply_chat_template, the following code will enable you to generate a correct prompt:

def chat_template(message, history, system_prompt):
    prompt_builder = ["<s>[INST] "]
    if system_prompt:
        prompt_builder.append(f"<<SYS>>\n{system_prompt}\n<</SYS>>\n\n")
    for human, assistant in history:
        prompt_builder.append(f"{human} [/INST] {assistant}</s>[INST] ")
    prompt_builder.append(f"{message} [/INST]")
    return ''.join(prompt_builder)

system_prompt = "Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim."
history = [
    ("Jakie mamy pory roku w Polsce?", "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.")
]
message = "Która jest najcieplejsza?"

prompt = chat_template(message, history, system_prompt)

📚 Documentation

Model

The SpeakLeash team is working on their own set of instructions in Polish, which is continuously being expanded and refined by annotators. A portion of these instructions, which had been manually verified and corrected, has been utilized for training purposes. Moreover, due to the limited availability of high-quality instructions in Polish, publicly accessible collections of instructions in English were used - OpenHermes-2.5 and orca-math-word-problems-200k, which accounted for half of the instructions used in training. The instructions varied in quality, leading to a deterioration in model’s performance. To counteract this while still allowing ourselves to utilize forementioned datasets,several improvements were introduced:

Weighted tokens level loss - a strategy inspired by offline reinforcement learning and C-RLFT
Adaptive learning rate inspired by the study on Learning Rates as a Function of Batch Size
Masked user instructions

Bielik-7B-Instruct-v0.1 has been trained with the use of an original open source framework called ALLaMo implemented by Krzysztof Ociepa. This framework allows users to train language models with architecture similar to LLaMA and Mistral in fast and efficient way.

Model description:

Property	Details
Developed by	SpeakLeash
Language	Polish
Model Type	causal decoder-only
Finetuned from	Bielik-7B-v0.1
License	CC BY NC 4.0 (non-commercial use)
Model ref	speakleash:e38140bea0d48f1218540800bbc67e89

Training

Framework: ALLaMo
Visualizations: W&B

Training hyperparameters:

Hyperparameter	Value
Context length	4096
Micro Batch Size	1
Batch Size	up to 4194304
Learning Rate (cosine, adaptive)	7e-6 -> 6e-7
Warmup Iterations	50
All Iterations	55440
Optimizer	AdamW
β1, β2	0.9, 0.95
Adam_eps	1e−8
Weight Decay	0.05
Grad Clip	1.0
Precision	bfloat16 (mixed)

Quant and MLX versions:

We know that some people want to explore smaller models or don't have the resources to run a full model. Therefore, we have prepared quantized versions of the Bielik-7B-Instruct-v0.1 model. We are also mindful of Apple Silicon.

Quantized versions (for non-GPU / weaker GPU):

https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-GGUF
https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-GPTQ
https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-AWQ
https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-EXL2
https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-3bit-HQQ

For Apple Silicon:

https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1-MLX

Evaluation

Models have been evaluated on Open PL LLM Leaderboard 5-shot. The benchmark evaluates models in NLP tasks like sentiment analysis, categorization, text classification but does not test chatting skills. Here are presented:

Average - average score among all tasks normalized by baseline scores
Reranking - reranking task, commonly used in RAG
Reader (Generator) - open book question answering task, commonly used in RAG
Perplexity (lower is better) - as a bonus, does not correlate with other scores and should not be used for model comparison

As of April 3, 2024, the following table showcases the current scores of pretrained and continuously pretrained models according to the Open PL LLM Leaderboard, evaluated in a 5-shot setting:

	Average	RAG Reranking	RAG Reader	Perplexity
7B parameters models:
Baseline (majority class)	0.00	53.36	-	-
Voicelab/trurl-2-7b	18.85	60.67	77.19	1098.88
meta-llama/Llama-2-7b-chat-hf	21.04	54.65	72.93	4018.74
mistralai/Mistral-7B-Instruct-v0.1	26.42	56.35	73.68	6909.94
szymonrucinski/Curie-7B-v1	26.72	55.58	85.19	389.17
HuggingFaceH4/zephyr-7b-beta	33.15	71.65	71.27	3613.14
HuggingFaceH4/zephyr-7b-alpha	33.97	71.47	73.35	4464.45
internlm/internlm2-chat-7b-sft	36.97	73.22	69.96	4269.63
internlm/internlm2-chat-7b	37.64	72.29	71.17	3892.50
Bielik-7B-Instruct-v0.1	39.28	61.89	86.00	277.92
mistralai/Mistral-7B-Instruct-v0.2	40.29	72.58	79.39	2088.08
teknium/OpenHermes-2.5-Mistral-7B	42.64	70.63	80.25	1463.00
openchat/openchat-3.5-1210	44.17	71.76	82.15	1923.83
speakleash/mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e1_70c70cc6 (experimental)	45.44	71.27	91.50	279.24
Nexusflow/Starling-LM-7B-beta	45.69	74.58	81.22	1161.54
openchat/openchat-3.5-0106	47.32	74.71	83.60	1106.56
berkeley-nest/Starling-LM-7B-alpha	47.46	75.73	82.86	1438.04

Models with different sizes:
Azurro/APT3-1B-Instruct-v1 (1B)	-13.80	52.11	12.23	739.09
Voicelab/trurl-2-13b-academic (13B)	29.45	68.19	79.88	733.91
upstage/SOLAR-10.7B-Instruct-v1.0 (10.7B)	46.07	76.93	82.86	789.58

7B parameters pretrained and continously pretrained models:
OPI-PG/Qra-7b	11.13

📄 License

The model is licensed under CC BY NC 4.0 (non-commercial use).

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご