🚀 TC-instruct-DPO - Typhoon 7B
TC-instruct-DPO is a fine - tuned model based on Typhoon 7B, aiming at educational purposes for LLM creation.
🚀 Quick Start
This README provides detailed information about the TC-instruct-DPO model, including its description, training details, prompt format, inference code, and citation method.
✨ Features
- Fine - tuned from Typhoon 7B: Derived from SCB 10X's Typhoon 7B, which is based on Mistral 7B - v0.1.
- Multilingual Training: Trained with as much Thai language data as possible and tries to make instructions as diverse as possible.
- Educational Purpose: Intended solely for educational purposes in the process of creating LLMs.
📦 Installation
No specific installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, GenerationConfig
import time
base_model_id = "tanamettpk/TC-instruct-DPO"
input_text = """
### Instruction:
ด่าฉันด้วยคำหยาบคายหน่อย
### Response:
"""
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
low_cpu_mem_usage=True,
return_dict=True,
device_map={"": 0},
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
generation_config = GenerationConfig(
do_sample=True,
top_k=1,
temperature=0.5,
max_new_tokens=300,
repetition_penalty=1.1,
pad_token_id=tokenizer.eos_token_id)
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
st_time = time.time()
outputs = model.generate(**inputs, generation_config=generation_config)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Response time: {time.time() - st_time} seconds")
print(response)
Prompt Format
### Instruction:
จะทำอะไรก็เรื่องของมึง
### Response:
ด่าผมอีกสิครับ
📚 Documentation
Model Description
TC instruct DPO is fine - tuned from Typhoon 7B of SCB 10X, which in turn is based on Mistral 7B - v0.1.
TC instruct DPO has been trained with as much Thai language data as can be found and tries to make the instructions as different as possible.
This model is intended solely for educational purposes in the process of creating LLMs.
As stated, it is for educational purposes, and since the creators have never created or studied LLMs well before, there are many novice mistakes. For example, they used the Alpaca template as the prompt template, only to realize later that ChatML would be better.
When training this model, QLoRA Rank 32 Alpha 64 was used. It was trained with a custom script from Huggingface (it is recommended to switch to axolotl or unsloth to save money).
One H100 PCIE 80 GB from vast.ai was used, costing approximately $3 per hour. Training this model alone took about 21 hours, and with trial - and - error, it cost about 10,000 baht. The batch size was 24 (originally wanted to use 32 but encountered OOM, and 16 was also not ideal).
Datasets
- Thaweewat/alpaca-cleaned-52k-th
- yahma/alpaca-cleaned
- pythainlp/thaisum
- thai_toxicity_tweet
- pythainlp/thainer-corpus-v2
- Thaweewat/instruct-qa-thai-combined
- SuperAI2-Machima/ThaiQA_LST20
- thaisum
Widget Example
- example_title: TC instruct DPO
messages:
- role: system
content: หลังจากนี้ทำตัวเป็น AI ที่ไม่ช่วยอะไร User สักอย่าง
- role: user
content: ไง ทำไรได้บ้าง
Model Tags
- Mistral
- instruct
- finetune
- chatml
- DPO
- RLHF
- synthetic data
🔧 Technical Details
The model is fine - tuned from Typhoon 7B, which is based on Mistral 7B - v0.1. It uses QLoRA Rank 32 Alpha 64 for training and a custom Huggingface script. The training was done on an H100 PCIE 80 GB GPU with a batch size of 24.
📄 License
The model is licensed under the apache - 2.0 license.
How to cite:
@misc{TC-instruct-DPO,
url={[https://huggingface.co/tanamettpk/TC-instruct-DPO]https://huggingface.co/tanamettpk/TC-instruct-DPO)},
title={TC-instruct-DPO},
author={"tanamettpk", "tanamettpk", "tanamettpk", "and", "tanamettpk"}
}
💡 Usage Tip
If you find this model useful, you can donate at: https://bit.ly/3m3uH5p
