๐ KULLM3 AWQ Quantization Version
This repository presents the AWQ quantization version of KULLM3. It offers advanced instruction - following and fluent chat capabilities, with remarkable performance in instruction - following, closely rivaling gpt - 3.5 - turbo. To our knowledge, it stands as one of the best publicly available Korean - speaking language models.
๐ Quick Start
Install Dependencies
pip install torch transformers==4.38.2 accelerate
โ ๏ธ Important Note
In transformers>=4.39.0, generate() does not work well. (as of 2024.4.4.)
Python code
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
MODEL_DIR = "nlpai-lab/KULLM3"
model = AutoModelForCausalLM.from_pretrained(MODEL_DIR, torch_dtype=torch.float16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
s = "๊ณ ๋ ค๋ํ๊ต์ ๋ํด์ ์๊ณ ์๋?"
conversation = [{'role': 'user', 'content': s}]
inputs = tokenizer.apply_chat_template(
conversation,
tokenize=True,
add_generation_prompt=True,
return_tensors='pt').to("cuda")
_ = model.generate(inputs, streamer=streamer, max_new_tokens=1024)
โจ Features
- Advanced instruction - following and fluent chat abilities.
- Remarkable performance in instruction - following, closely following gpt - 3.5 - turbo.
- One of the best publicly opened Korean - speaking language models.
๐ฆ Installation
The installation steps mainly involve installing the necessary dependencies. You can use the following command to install them:
pip install torch transformers==4.38.2 accelerate
๐ Documentation
Model Description
This is the model card of a ๐ค transformers model that has been pushed on the Hub.
Training Details
Training Data
- vicgalle/alpaca-gpt4
- Mixed Korean instruction data (gpt - generated, hand - crafted, etc)
- About 66000+ examples used totally
Training Procedure
- Trained with fixed system prompt below.
๋น์ ์ ๊ณ ๋ ค๋ํ๊ต NLP&AI ์ฐ๊ตฌ์ค์์ ๋ง๋ AI ์ฑ๋ด์
๋๋ค.
๋น์ ์ ์ด๋ฆ์ 'KULLM'์ผ๋ก, ํ๊ตญ์ด๋ก๋ '๊ตฌ๋ฆ'์ ๋ปํฉ๋๋ค.
๋น์ ์ ๋น๋๋์ ์ด๊ฑฐ๋, ์ฑ์ ์ด๊ฑฐ๋, ๋ถ๋ฒ์ ์ด๊ฑฐ๋ ๋๋ ์ฌํ ํต๋
์ ์ผ๋ก ํ์ฉ๋์ง ์๋ ๋ฐ์ธ์ ํ์ง ์์ต๋๋ค.
์ฌ์ฉ์์ ์ฆ๊ฒ๊ฒ ๋ํํ๋ฉฐ, ์ฌ์ฉ์์ ์๋ต์ ๊ฐ๋ฅํ ์ ํํ๊ณ ์น์ ํ๊ฒ ์๋ตํจ์ผ๋ก์จ ์ต๋ํ ๋์์ฃผ๋ ค๊ณ ๋
ธ๋ ฅํฉ๋๋ค.
์ง๋ฌธ์ด ์ด์ํ๋ค๋ฉด, ์ด๋ค ๋ถ๋ถ์ด ์ด์ํ์ง ์ค๋ช
ํฉ๋๋ค. ๊ฑฐ์ง ์ ๋ณด๋ฅผ ๋ฐ์ธํ์ง ์๋๋ก ์ฃผ์ํฉ๋๋ค.
Evaluation
- Evaluation details such as testing data, metrics are written in github.
- Without system prompt used in training phase, KULLM would show lower performance than expect.
Results
Quantization Details
The quantization was carried out in a custom branch of autoawq. The hyperparameters for quantization are as follows.
{ "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
It worked using vllm. It may not work with other frameworks as they have not been tested.
๐ License
This model is licensed under CC - BY - NC 4.0.
๐ Citation
@misc{kullm,
author = {NLP & AI Lab and Human-Inspired AI research},
title = {KULLM: Korea University Large Language Model Project},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/nlpai-lab/kullm}},
}