🚀 INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1
INSAIT presents MamayLM-Gemma-2-9B-IT-v0.1, a high - performing Ukrainian language model based on Google's Gemma 2 models, free to use and licensed under Gemma terms.
🚀 Quick Start
Installation
First, install the latest version of the transformers
library:
pip install -U 'transformers[torch]'
Loading the Model
Then, load the model in transformers
:
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto",
)
✨ Features
- Multilingual Capability: The model supports both Ukrainian and English, achieving excellent performance in both languages.
- Outstanding Performance: It outperforms much larger models like Alibaba’s Qwen 2.5 72B and Meta’s Llama3.1 70B in Ukrainian benchmarks.
- Instruction Fine - Tuning: Leveraging instruction fine - tuning, it can better understand and follow user instructions.
📦 Installation
As described in the quick start section, you need to install the transformers
library first:
pip install -U 'transformers[torch]'
💻 Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto",
)
Advanced Usage
from transformers import GenerationConfig
generation_params = GenerationConfig(
max_new_tokens=2048,
temperature=0.1,
top_k=25,
top_p=1,
repetition_penalty=1.1,
eos_token_id=[1,107],
do_sample=True
)
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1",
use_default_system_prompt=False,
)
messages = [
{"role": "user", "content": "Хто такий Козак Мамай?"},
]
input_ids = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True,
return_dict=True
)
outputs = model.generate(
**input_ids,
generation_config=generation_params
)
print(tokenizer.decode(outputs[0]))
📚 Documentation
Model Description
The model is built on top of Google’s Gemma 2 9B open models. It was continuously pre - trained on a large pre - filtered dataset (75B tokens of Ukrainian and English data in total) using data mixing and model merging. This allows the model to gain outstanding Ukrainian cultural and linguistic capabilities while retaining its English performance.
During pre - training, various datasets were used, including Ukrainian web crawl data (FineWeb2), freely available datasets like Wikipedia, specialized Ukrainian datasets, and machine translations of popular English datasets. Then, it was instruction - fine - tuned on a newly constructed Ukrainian instruction dataset created using machine translations of current best English datasets and specialized Ukrainian datasets prepared by the Ukrainian community.
For more information, check our blogpost (English, Ukrainian).
Benchmarks and Results

The model is evaluated on a set of standard English benchmarks, a translated version in Ukrainian, and Ukrainian - specific benchmarks:
- Winogrande challenge: testing world knowledge and understanding
- Hellaswag: testing sentence completion
- ARC Easy/Challenge: testing logical reasoning
- TriviaQA: testing trivia knowledge
- GSM - 8k: solving multiple - choice questions in high - school mathematics
- MMLU: testing knowledge on a multitude of topics
- IFEval: testing instruction - following skills
- ZNO: testing knowledge of the Ukrainian high school curriculum in Ukrainian language & literature, history, mathematics and geography
The results show that the model can outperform much larger models in Ukrainian benchmarks and retains excellent English performance.
Recommended Parameters
For optimal performance, we recommend the following parameters for text generation:
from transformers import GenerationConfig
generation_params = GenerationConfig(
max_new_tokens=2048,
temperature=0.1,
top_k=25,
top_p=1,
repetition_penalty=1.1,
eos_token_id=[1,107],
do_sample=True
)
In principle, increasing temperature should work adequately as well.
Instruction Format
To leverage instruction fine - tuning, your prompt should begin with a beginning - of - sequence token <bos>
and be formatted in the Gemma 2 chat template. <bos>
should only be the first token in a chat sequence.
E.g.
<bos><start_of_turn>user
Хто такий Козак Мамай?<end_of_turn>
<start_of_turn>model
This format is also available as a chat template via the apply_chat_template()
method.
Use with vLLM
from vllm import LLM, SamplingParams
from vllm.inputs import TokensPrompt
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1",
use_default_system_prompt=False,
)
sampling_params = SamplingParams(
max_tokens=2048,
temperature=0.1,
top_k=25,
top_p=1,
repetition_penalty=1.1,
stop_token_ids=[1, 107],
)
llm = LLM(
model="INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1",
dtype="bfloat16",
enforce_eager=True
)
messages = [
{"role": "user", "content": "Хто такий Козак Мамай?"},
]
formatted_prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
input_ids = tokenizer(
formatted_prompt,
add_special_tokens=False
).input_ids
prompt = TokensPrompt(prompt_token_ids=input_ids)
output = llm.generate(
prompt,
sampling_params
)
generated_text = output[0].outputs[0].text
print(generated_text)
Use with GGML / llama.cpp
The model and instructions for usage in GGUF format are available at INSAIT-Institute/MamayLM-Gemma-2-9B-IT-v0.1-GGUF.
Community Feedback
We welcome feedback from the community to help improve MamayLM. If you have suggestions, encounter any issues, or have ideas for improvements, please:
- Share your experience using the model through Hugging Face's community discussion feature or
- Contact us at contact@insait.ai
Your real - world usage and insights are valuable in helping us optimize the model's performance and behaviour for various use cases.
Summary
📄 License
MamayLM is distributed under Gemma Terms of Use.