Ko - Gemma - 2 - 9B - open source Korean dialogue model, free to optimize Korean text generation tasks

Ko Gemma 2 9b It

Developed by rtzr

Ko-Gemma-2-9B-IT is a Korean dialogue model in the Gemma model series, based on supervised fine-tuning (SFT) and direct preference optimization (DPO) training of google/gemma-2-9b-it, specifically optimized for Korean text generation tasks.

Large Language Model

Transformers

Korean#Korean dialogue optimization #DPO fine-tuning #Lightweight generation

Downloads 3,467

Release Time : 7/11/2024

Model Overview

This is a decoder-only large language model designed for Korean text-to-text tasks, supporting various text generation scenarios such as Q&A, summarization, and reasoning.

Model Features

Korean optimization

Trained with supervised fine-tuning and direct preference optimization on high-quality Korean datasets, specifically optimized for Korean text generation tasks.

Lightweight design

Its relatively small size allows deployment in resource-limited environments such as laptops, desktops, or private cloud infrastructure.

Multi-task support

Supports various text generation tasks including Q&A, summarization, and reasoning, suitable for a wide range of Korean application scenarios.

Model Capabilities

Korean text generation

Q&A system

Text summarization

Logical reasoning

Dialogue system

Use Cases

Travel services

Travel itinerary planning

Generates personalized travel route recommendations based on user needs

Example: Generates a sightseeing route including famous attractions in Seoul

Educational assistance

Study tutoring

Answers various academic questions from students

🚀 Ko-Gemma-2-9B-IT: Korean Conversational Model

A Korean-language conversational model in the Gemma family, designed for various text generation tasks.

🚀 Quick Start

To start using the Ko-Gemma-2-9B-IT model, you need to install the necessary dependencies and follow the usage examples provided below.

✨ Features

Korean Language Support: Specifically fine-tuned for Korean language tasks, making it suitable for Korean users.
High-Quality Training: Trained on carefully curated high-quality datasets using Supervised Fine-Tuning (SFT) and Direct Preference Optimization for Human Feedback.
Lightweight and Deployable: Can be deployed in environments with limited resources, such as laptops, desktops, or your own cloud infrastructure.

📦 Installation

You must install transformers >= 4.42.3 for gemma2 models.

pip install transformers==4.42.3 accelerate

💻 Usage Examples

Basic Usage

import transformers
import torch


model_id = "rtzr/ko-gemma-2-9b-it"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

pipeline.model.eval()
instruction = "서울의 유명한 관광 코스를 만들어줄래?"

messages = [
    {"role": "user", "content": f"{instruction}"}
]

prompt = pipeline.tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=2048,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

print(outputs[0]["generated_text"][len(prompt):])

Advanced Usage

import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM


model_id = "rtzr/ko-gemma-2-9b-it"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

model.eval()
instruction = "서울의 유명한 관광 코스를 만들어줄래?"

messages = [
    {"role": "user", "content": f"{instruction}"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<end_of_turn>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=2048,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))

Quantized Versions through bitsandbytes

# pip install bitsandbytes
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig


model_id = "rtzr/ko-gemma-2-9b-it"
quantization_config_8bit = BitsAndBytesConfig(load_in_8bit=True)
# quantization_config_4bit = BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    quantization_config=quantization_config_8bit,
    # quantization_config=quantization_config_4bit,
    low_cpu_mem_usage=True,
)

model.eval()
instruction = "서울의 유명한 관광 코스를 만들어줄래?"

messages = [
    {"role": "user", "content": f"{instruction}"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<end_of_turn>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=2048,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))

VLLM Usage

When we use vllm==0.5.1, the gemma2 model cannot be loaded yet and the following issue occurs. So it is recommended to use vllm/vllm-openai:latest docker or vllm==0.5.0.post1.

#!/bin/bash

VLLM_ATTENTION_BACKEND=FLASHINFER
MODEL_NAME="rtzr/ko-gemma-2-9b-it"

MODEL_PATH="YOUR_PATH/${MODEL_NAME}"
docker run --rm --gpus all   \
    -p 8000:8000  \
    --shm-size=12gb --ulimit memlock=-1 --ulimit stack=67108864 \
    -e VLLM_ATTENTION_BACKEND=${VLLM_ATTENTION_BACKEND} \
    -v $MODEL_PATH:/vllm-workspace/${MODEL_NAME} \
    vllm/vllm-openai:latest \
        --model ${MODEL_NAME} --dtype auto \
        --gpu-memory-utilization 0.8

📚 Documentation

Model Details

Ko-Gemma-2-9B-IT

Ko-Gemma-2-9B-IT is a Korean-language conversational model that is part of the Gemma family of models. It is a text-to-text, decoder-only large language model, available in Korean. We fine-tuned this model on a carefully curated high-quality dataset using Supervised Fine-Tuning (SFT). And we use Direct Preference Optimization training specifically for Human Feedback. The datasets include:

Some of these datasets were partially used and translated for training. In particular, a lot of repetition occurred during the translation process, so preprocessing was performed based on N-gram.

Inputs and outputs

Input: Text string, such as a question, a prompt, or a document to be summarized.
Output: Generated Korean-language text in response to the input, such as an answer to a question, or a summary of a document.

Google Gemma 2

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Benchmark Scores

We evaluated it internally using LogicKor code. While the public LogicKor code is assessed as GPT-4, our internal evaluation was conducted as GPT-4o. Public scores will be added as they are released. The scores below include only 0-shot evaluations.

Model	Math	Reasoning	Writing	Coding	Understanding	Grammar	Single ALL	Multi ALL	Overall
rtzr/ko-gemma-2-9b-it	8.71 / 8.00	9.14 / 8.00	9.43 / 9.29	9.00 / 9.43	9.57 / 9.86	7.14 / 5.00	8.83	8.26	8.55
google/gemma-2-9b-it	8.57 / 7.71	8.86 / 7.00	9.29 / 9.29	9.29 / 9.57	8.57 / 8.29	6.86 / 3.86	8.57	7.62	8.10
MLP-KTLim/llama-3-Korean-Bllossom-8B	6.43 / 5.71	6.86 / 5.14	9.14 / 8.57	8.29 / 8.14	8.43 / 9.29	5.71 / 5.29	7.48	7.02	7.25
yanolja/EEVE-Korean-Instruct-10.8B-v1.0	5.57 / 4.29	8.14 / 5.14	8.29 / 6.29	6.43 / 7.86	9.29 / 8.57	6.57 / 3.71	7.38	5.98	6.68
allganize/Llama-3-Alpha-Ko-8B-Instruct	4.57 / 3.00	6.86 / 6.43	7.43 / 6.71	8.43 / 8.43	7.71 / 8.71	6.71 / 4.43	6.95	6.29	6.62

📄 License

Gemma 2 License: https://ai.google.dev/gemma/terms

📚 Citation

@article{RTZR,
  title={ko-gemma-2-9b-it},
  author={Return Zero Team},
  year={2024},
  url={https://huggingface.co/rtzr/ko-gemma-2-9b-it}
}

@article{gemma_2024,
    title={Gemma},
    url={https://www.kaggle.com/m/3301},
    DOI={10.34740/KAGGLE/M/3301},
    publisher={Kaggle},
    author={Gemma Team},
    year={2024}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご