Model Overview
Model Features
Model Capabilities
Use Cases
đ Ko-Gemma-2-9B-IT: Korean Conversational Model
A Korean-language conversational model in the Gemma family, designed for various text generation tasks.
đ Quick Start
To start using the Ko-Gemma-2-9B-IT model, you need to install the necessary dependencies and follow the usage examples provided below.
⨠Features
- Korean Language Support: Specifically fine-tuned for Korean language tasks, making it suitable for Korean users.
- High-Quality Training: Trained on carefully curated high-quality datasets using Supervised Fine-Tuning (SFT) and Direct Preference Optimization for Human Feedback.
- Lightweight and Deployable: Can be deployed in environments with limited resources, such as laptops, desktops, or your own cloud infrastructure.
đĻ Installation
You must install transformers >= 4.42.3
for gemma2 models.
pip install transformers==4.42.3 accelerate
đģ Usage Examples
Basic Usage
import transformers
import torch
model_id = "rtzr/ko-gemma-2-9b-it"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
pipeline.model.eval()
instruction = "ėė¸ė ė ëĒ
í ę´ę´ ėŊė¤ëĨŧ ë§ë¤ė´ė¤ë?"
messages = [
{"role": "user", "content": f"{instruction}"}
]
prompt = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")
]
outputs = pipeline(
prompt,
max_new_tokens=2048,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])
Advanced Usage
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "rtzr/ko-gemma-2-9b-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model.eval()
instruction = "ėė¸ė ė ëĒ
í ę´ę´ ėŊė¤ëĨŧ ë§ë¤ė´ė¤ë?"
messages = [
{"role": "user", "content": f"{instruction}"}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<end_of_turn>")
]
outputs = model.generate(
input_ids,
max_new_tokens=2048,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
Quantized Versions through bitsandbytes
# pip install bitsandbytes
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "rtzr/ko-gemma-2-9b-it"
quantization_config_8bit = BitsAndBytesConfig(load_in_8bit=True)
# quantization_config_4bit = BitsAndBytesConfig(load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
quantization_config=quantization_config_8bit,
# quantization_config=quantization_config_4bit,
low_cpu_mem_usage=True,
)
model.eval()
instruction = "ėė¸ė ė ëĒ
í ę´ę´ ėŊė¤ëĨŧ ë§ë¤ė´ė¤ë?"
messages = [
{"role": "user", "content": f"{instruction}"}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<end_of_turn>")
]
outputs = model.generate(
input_ids,
max_new_tokens=2048,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
VLLM Usage
When we use vllm==0.5.1
, the gemma2 model cannot be loaded yet and the following issue occurs. So it is recommended to use vllm/vllm-openai:latest
docker or vllm==0.5.0.post1
.
#!/bin/bash
VLLM_ATTENTION_BACKEND=FLASHINFER
MODEL_NAME="rtzr/ko-gemma-2-9b-it"
MODEL_PATH="YOUR_PATH/${MODEL_NAME}"
docker run --rm --gpus all \
-p 8000:8000 \
--shm-size=12gb --ulimit memlock=-1 --ulimit stack=67108864 \
-e VLLM_ATTENTION_BACKEND=${VLLM_ATTENTION_BACKEND} \
-v $MODEL_PATH:/vllm-workspace/${MODEL_NAME} \
vllm/vllm-openai:latest \
--model ${MODEL_NAME} --dtype auto \
--gpu-memory-utilization 0.8
đ Documentation
Model Details
Ko-Gemma-2-9B-IT
Ko-Gemma-2-9B-IT is a Korean-language conversational model that is part of the Gemma family of models. It is a text-to-text, decoder-only large language model, available in Korean. We fine-tuned this model on a carefully curated high-quality dataset using Supervised Fine-Tuning (SFT). And we use Direct Preference Optimization training specifically for Human Feedback. The datasets include:
Some of these datasets were partially used and translated for training. In particular, a lot of repetition occurred during the translation process, so preprocessing was performed based on N-gram.
Inputs and outputs
- Input: Text string, such as a question, a prompt, or a document to be summarized.
- Output: Generated Korean-language text in response to the input, such as an answer to a question, or a summary of a document.
Google Gemma 2
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.
Benchmark Scores
We evaluated it internally using LogicKor code. While the public LogicKor code is assessed as GPT-4, our internal evaluation was conducted as GPT-4o. Public scores will be added as they are released. The scores below include only 0-shot evaluations.
Model | Math | Reasoning | Writing | Coding | Understanding | Grammar | Single ALL | Multi ALL | Overall |
---|---|---|---|---|---|---|---|---|---|
rtzr/ko-gemma-2-9b-it | 8.71 / 8.00 | 9.14 / 8.00 | 9.43 / 9.29 | 9.00 / 9.43 | 9.57 / 9.86 | 7.14 / 5.00 | 8.83 | 8.26 | 8.55 |
google/gemma-2-9b-it | 8.57 / 7.71 | 8.86 / 7.00 | 9.29 / 9.29 | 9.29 / 9.57 | 8.57 / 8.29 | 6.86 / 3.86 | 8.57 | 7.62 | 8.10 |
MLP-KTLim/llama-3-Korean-Bllossom-8B | 6.43 / 5.71 | 6.86 / 5.14 | 9.14 / 8.57 | 8.29 / 8.14 | 8.43 / 9.29 | 5.71 / 5.29 | 7.48 | 7.02 | 7.25 |
yanolja/EEVE-Korean-Instruct-10.8B-v1.0 | 5.57 / 4.29 | 8.14 / 5.14 | 8.29 / 6.29 | 6.43 / 7.86 | 9.29 / 8.57 | 6.57 / 3.71 | 7.38 | 5.98 | 6.68 |
allganize/Llama-3-Alpha-Ko-8B-Instruct | 4.57 / 3.00 | 6.86 / 6.43 | 7.43 / 6.71 | 8.43 / 8.43 | 7.71 / 8.71 | 6.71 / 4.43 | 6.95 | 6.29 | 6.62 |
đ License
Gemma 2 License: https://ai.google.dev/gemma/terms
đ Citation
@article{RTZR,
title={ko-gemma-2-9b-it},
author={Return Zero Team},
year={2024},
url={https://huggingface.co/rtzr/ko-gemma-2-9b-it}
}
@article{gemma_2024,
title={Gemma},
url={https://www.kaggle.com/m/3301},
DOI={10.34740/KAGGLE/M/3301},
publisher={Kaggle},
author={Gemma Team},
year={2024}
}

