🚀 INSAIT-Institute/BgGPT-Gemma-2-27B-IT-v1.0
INSAIT presents BgGPT-Gemma-2-27B-IT-v1.0, a cutting - edge Bulgarian language model based on Google's Gemma 2. It's free to use and performs well in both Bulgarian and English.
🚀 Quick Start
Installation
First, install the latest version of the transformers
library:
pip install -U 'transformers[torch]'
Loading the Model
Then load the model in transformers
:
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"INSAIT-Institute/BgGPT-Gemma-2-27B-IT-v1.0",
torch_dtype=torch.bfloat16,
attn_implementation="eager",
device_map="auto",
)
✨ Features
- Multilingual Proficiency: BgGPT-Gemma-2-27B-IT-v1.0 is proficient in both Bulgarian and English, achieving outstanding performance in both languages.
- Free to Use: The model is free to use under the Gemma Terms of Use.
- State - of - the - Art Performance: It outperforms much larger models in Bulgarian benchmarks and retains excellent English performance inherited from the original Google Gemma 2 models.
📦 Installation
The installation steps are as follows:
pip install -U 'transformers[torch]'
💻 Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"INSAIT-Institute/BgGPT-Gemma-2-27B-IT-v1.0",
torch_dtype=torch.bfloat16,
attn_implementation="eager",
device_map="auto",
)
Advanced Usage
Recommended Parameters
from transformers import GenerationConfig
generation_params = GenerationConfig(
max_new_tokens=2048,
temperature=0.1,
top_k=25,
top_p=1,
repetition_penalty=1.1,
eos_token_id=[1,107],
do_sample=True
)
Instruction Format
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"INSAIT-Institute/BgGPT-Gemma-2-27B-IT-v1.0",
use_default_system_prompt=False,
)
messages = [
{"role": "user", "content": "Кога е основан Софийският университет?"},
]
input_ids = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True,
return_dict=True
)
outputs = model.generate(
**input_ids,
generation_config=generation_params
)
print(tokenizer.decode(outputs[0]))
Use with vLLM
from vllm import LLM, SamplingParams
from vllm.inputs import TokensPrompt
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"INSAIT-Institute/BgGPT-Gemma-2-27B-IT-v1.0",
use_default_system_prompt=False,
)
sampling_params = SamplingParams(
max_tokens=2048,
temperature=0.1,
top_k=25,
top_p=1,
repetition_penalty=1.1,
stop_token_ids=[1, 107],
)
llm = LLM(
model="INSAIT-Institute/BgGPT-Gemma-2-27B-IT-v1.0",
dtype="bfloat16",
enforce_eager=True
)
messages = [
{"role": "user", "content": "Кога е основан Софийският университет?"},
]
formatted_prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
input_ids = tokenizer(
formatted_prompt,
add_special_tokens=False
).input_ids
prompt = TokensPrompt(prompt_token_ids=input_ids)
output = llm.generate(
prompt,
sampling_params
)
generated_text = output[0].outputs[0].text
print(generated_text)
📚 Documentation
Model Description
The model is built on top of Google’s Gemma 2 27B open models. It was continuously pre - trained on around 100 billion tokens (85 billion in Bulgarian) using the Branch - and - Merge strategy INSAIT presented at EMNLP’24. This allows the model to gain outstanding Bulgarian cultural and linguistic capabilities while retaining its English performance. During the pre - training stage, various datasets are used, including Bulgarian web crawl data, freely available datasets such as Wikipedia, a range of specialized Bulgarian datasets sourced by the INSAIT Institute, and machine translations of popular English datasets. The model was then instruction - fine - tuned on a newly constructed Bulgarian instruction dataset created using real - world conversations. For more information, check the blogpost.
Benchmarks and Results

The model is evaluated on a set of standard English benchmarks, a translated version of them in Bulgarian, as well as Bulgarian specific benchmarks. These benchmarks test logical reasoning, mathematics, knowledge, language understanding and other skills of the models and are provided at https://github.com/insait-institute/lm-evaluation-harness-bg. The results show the excellent abilities of both 9B and 27B models in Bulgarian, allowing them to outperform much larger models, including Alibaba’s Qwen 2.5 72B and Meta’s Llama3.1 70B. Both BgGPT 9B and BgGPT 27B significantly improve upon the previous version of BgGPT based on Mistral - 7B. The models also retain the excellent English performance inherited from the original Google Gemma 2 models.
Chat Preference

The BgGPT 27B model is evaluated in terms of chat performance on thousands of real - world Bulgarian conversations from around 100 different topics. The results show that the model significantly surpasses the performance of the smaller variants of commercial models in Bulgarian chat performance and is on par with the best commercial models according to GPT - 4o itself.
Use with GGML / llama.cpp
The model and instructions for usage in GGUF format are available at INSAIT-Institute/BgGPT-Gemma-2-27B-IT-v1.0-GGUF.
Community Feedback
The community's feedback is welcome to help improve BgGPT. You can share your experience using the model through Hugging Face's community discussion feature or contact the team at bggpt@insait.ai.
Summary
🔧 Technical Details
The model uses the Branch - and - Merge strategy presented at EMNLP’24 during pre - training. It is pre - trained on around 100 billion tokens (85 billion in Bulgarian) and then instruction - fine - tuned on a Bulgarian instruction dataset created from real - world conversations.
📄 License
BgGPT is distributed under Gemma Terms of Use
⚠️ Important Note
Models based on Gemma 2 such as BgGPT-Gemma-2-27B-IT-v1.0 do not support flash attention. Using it results in degraded performance.
💡 Usage Tip
For optimal performance, use the recommended parameters for text generation as extensively tested by the developers.