Bielik-11B-v2.3-Instruct Open-source Text Model - Free Deployment for Polish Text Generation

Bielik 11B V2.3 Instruct

Developed by speakleash

Bielik-11B-v2.3-Instruct is a generative text model with 11 billion parameters, specifically designed for Polish, developed by SpeakLeash in collaboration with ACK Cyfronet AGH.

Large Language Model

Transformers

OtherOpen Source License:Apache-2.0 #Polish instruction optimization #Supercomputing training #Multi-model merging

Downloads 29.32k

Release Time : 8/30/2024

Model Overview

This model is a linear merge of the instruction-tuned versions of the Bielik-11B-v2 series, demonstrating exceptional Polish language understanding and processing capabilities, capable of accurately completing various language tasks.

Model Features

Multi-model merging

A linear merge of three models: Bielik-11B-v2.0-Instruct, Bielik-11B-v2.1-Instruct, and Bielik-11B-v2.2-Instruct

Polish language optimization

Trained on a carefully curated Polish corpus by the SpeakLeash team, showcasing exceptional Polish language understanding and processing capabilities

High-performance computing support

Training completed using the Polish PLGrid supercomputing environment (Athena and Helios supercomputers)

Multiple quantized versions

Offers various quantized versions including GGUF, GPTQ 4bit, and FP8 to meet diverse needs

Model Capabilities

Polish text generation

Instruction understanding and execution

Multi-turn dialogue processing

Use Cases

Language processing

Polish Q&A

Answering questions about Polish culture, history, etc.

Capable of accurately completing various language tasks

Multi-turn dialogue

Engaging in natural and fluent multi-turn dialogues

Scored 8.56 in the Polish MT-Bench test, surpassing GPT-3.5-turbo

🚀 Bielik-11B-v2.3-Instruct

Bielik-11B-v2.3-Instruct is a generative text model with 11 billion parameters. It addresses the need for high - performance Polish language processing by merging multiple fine - tuned models. This model is a result of a unique collaboration, leveraging Polish computing infrastructure and large - scale text corpora, enabling accurate language understanding and task execution in Polish.

🚀 Quick Start

The model uses ChatML as the prompt format. Here is a basic example of how to use it:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model_name = "speakleash/Bielik-11B-v2.3-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

messages = [
    {"role": "system", "content": "Odpowiadaj krótko, precyzyjnie i wyczerpująco w języku polskim."},
    {"role": "user", "content": "Jakie mamy pory roku w Polsce?"},
    {"role": "assistant", "content": "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima."},
    {"role": "user", "content": "Która jest najcieplejsza?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = input_ids.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

✨ Features

Multi - model Merge: It is a linear merge of Bielik-11B-v2.0-Instruct, Bielik-11B-v2.1-Instruct, and Bielik-11B-v2.2-Instruct, which are instruct fine - tuned versions of Bielik-11B-v2.
Polish Language Focus: Developed and trained on Polish text corpora, enabling excellent performance in Polish language tasks.
Advanced Training Techniques: Utilized techniques such as weighted tokens level loss, adaptive learning rate, and masked prompt tokens to improve performance.
Multiple Quantized Versions: Available in various quantized versions, including GGUF, GPTQ, and FP8, to suit different resource requirements.

📦 Installation

No specific installation steps are provided in the original README. If you want to use the model, you can install the necessary libraries as shown in the quick - start code example:

pip install transformers torch

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_name = "speakleash/Bielik-11B-v2.3-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

messages = [
    {"role": "user", "content": "Jakie mamy pory roku?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = input_ids.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=100, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

Advanced Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_name = "speakleash/Bielik-11B-v2.3-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

messages = [
    {"role": "system", "content": "Odpowiadaj krótko, precyzyjnie i wyczerpująco w języku polskim."},
    {"role": "user", "content": "Jakie mamy pory roku w Polsce?"},
    {"role": "assistant", "content": "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima."},
    {"role": "user", "content": "Która jest najcieplejsza?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = input_ids.to(device)
model.to(device)

# Adjust generation parameters for more control
generated_ids = model.generate(model_inputs, max_new_tokens=500, temperature=0.7, top_k=50, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

📚 Documentation

Model

The SpeakLeash team developed custom Polish instructions. Due to the scarcity of high - quality Polish instructions, synthetic instructions were generated with Mixtral 8x22B and used in training. The training dataset had over 20 million instructions with more than 10 billion tokens. To improve performance, several techniques were introduced:

Weighted tokens level loss, inspired by offline reinforcement learning and C - RLFT
Adaptive learning rate inspired by Learning Rates as a Function of Batch Size
Masked prompt tokens

The DPO - Positive method was used to align the model with user preferences. The model was merged using mergekit by Remigiusz Kinas.

Model description:

Property	Details
Developed by	SpeakLeash & ACK Cyfronet AGH
Language	Polish
Model Type	causal decoder - only
Merged from	Bielik-11B-v2.0-Instruct, Bielik-11B-v2.1-Instruct, Bielik-11B-v2.2-Instruct
License	Apache 2.0 and Terms of Use

Quantized models:

GGUF - Q4_K_M, Q5_K_M, Q6_K, Q8_0
GPTQ - 4bit
FP8 (vLLM, SGLang - Ada Lovelace, Hopper optimized)
GGUF - experimental - IQ imatrix IQ1_M, IQ2_XXS, IQ3_XXS, IQ4_XS and calibrated Q4_K_M, Q5_K_M, Q6_K, Q8_0

⚠️ Important Note

Quantized models may offer lower quality of generated answers compared to full - sized variants.

Chat template

Bielik-11B-v2.3-Instruct uses ChatML as the prompt format. For example:

prompt = "<s><|im_start|> user\nJakie mamy pory roku?<|im_end|> \n<|im_start|> assistant\n"
completion = "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.<|im_end|> \n"

Evaluation

Bielik-11B-v2.3-Instruct has been evaluated on several benchmarks:

Open PL LLM Leaderboard
Open LLM Leaderboard
Polish MT - Bench
Polish EQ - Bench (Emotional Intelligence Benchmark)
MixEval

Open PL LLM Leaderboard

Models were evaluated on Open PL LLM Leaderboard 5 - shot. The benchmark assesses NLP tasks like sentiment analysis, categorization, and text classification.

Model	Parameters (B)	Average
Meta - Llama - 3.1 - 405B - Instruct - FP8,API	405	69.44
Mistral - Large - Instruct - 2407	123	69.11
Qwen2 - 72B - Instruct	72	65.87
Bielik-11B-v2.3-Instruct	11	65.71
Bielik-11B-v2.2-Instruct	11	65.57
Meta - Llama - 3.1 - 70B - Instruct	70	65.49
Bielik-11B-v2.1-Instruct	11	65.45
Mixtral - 8x22B - Instruct - v0.1	141	65.23
Bielik-11B-v2.0-Instruct	11	64.98
Meta - Llama - 3 - 70B - Instruct	70	64.45
Athene - 70B	70	63.65
WizardLM - 2 - 8x22B	141	62.35
Qwen1.5 - 72B - Chat	72	58.67
Qwen2 - 57B - A14B - Instruct	57	56.89
glm - 4 - 9b - chat	9	56.61
aya - 23 - 35B	35	56.37
Phi - 3.5 - MoE - instruct	41.9	56.34
openchat - 3.5 - 0106 - gemma	7	55.69
Mistral - Nemo - Instruct - 2407	12	55.27
SOLAR - 10.7B - Instruct - v1.0	10.7	55.24
Mixtral - 8x7B - Instruct - v0.1	46.7	55.07
Bielik-11B-v2.0-Instruct	11	64.98
Bielik-7B-Instruct-v0.1	7	44.70
trurl - 2 - 13b - academic	13	36.28
trurl - 2 - 7b	7	26.93

The results show that Bielik-11B-v2.3-Instruct:

Outperforms all other models with less than 70B parameters.
Performs on par with models in the 70B parameter range.
Shows a marked improvement over its predecessor, Bielik-7B-Instruct-v0.1.
Stands out as a leader among Polish language models.

Open PL LLM Leaderboard - Generative Tasks Performance

Model	Parameters (B)	Average g
Bielik-11B-v2.3-Instruct	11	67.47
Bielik-11B-v2.1-Instruct	11	66.58
Bielik-11B-v2.2-Instruct	11	66.11
Bielik-11B-v2.0-Instruct	11	65.58
gpt-3.5-turbo-instruct	Unavailable	N/A

🔧 Technical Details

Training Dataset: Comprised of over 20 million instructions with more than 10 billion tokens.
Training Techniques: Weighted tokens level loss, adaptive learning rate, masked prompt tokens, and DPO - Positive for alignment.
Merge Method: Linear merge of multiple models using mergekit.

📄 License

The model is licensed under Apache 2.0 and Terms of Use.

⚠️ Important Note

If you want to learn more about how you can use the model, please refer to our Terms of Use.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご