TC-instruct-DPO Open-source Thai Instruction Optimization Model - Fine-tuned Based on Typhoon 7B to Meet Diverse Needs

TC Instruct DPO

Developed by tanamettpk

Thai instruction-optimized model fine-tuned from Typhoon-7B using Direct Preference Optimization (DPO) technology

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Thai instruction fine-tuning #DPO reinforcement learning #QLoRA efficient training

Downloads 28

Release Time : 2/17/2024

Model Overview

This model is a Thai instruction-optimized model fine-tuned from SCB 10X's Typhoon-7B (derived from Mistral-7B), specifically developed for researching large language model construction processes. Trained using QLoRA technology, it supports various Thai instruction tasks.

Model Features

Thai Instruction Optimization

Specifically optimized for Thai instructions to ensure instruction diversity

Direct Preference Optimization (DPO)

Trained using Direct Preference Optimization technology to improve response quality

QLoRA Efficient Fine-tuning

Efficient fine-tuning using QLoRA technology (rank 32, alpha value 64)

Model Capabilities

Thai text generation

Instruction following

Q&A system

Use Cases

Research Applications

Large Language Model Construction Research

Used for researching Thai large language model construction processes and techniques

Dialogue Systems

Thai Chatbot

Can be used to build Thai dialogue systems

🚀 TC-instruct-DPO - Typhoon 7B

TC-instruct-DPO is a fine - tuned model based on Typhoon 7B, aiming at educational purposes for LLM creation.

🚀 Quick Start

This README provides detailed information about the TC-instruct-DPO model, including its description, training details, prompt format, inference code, and citation method.

✨ Features

Fine - tuned from Typhoon 7B: Derived from SCB 10X's Typhoon 7B, which is based on Mistral 7B - v0.1.
Multilingual Training: Trained with as much Thai language data as possible and tries to make instructions as diverse as possible.
Educational Purpose: Intended solely for educational purposes in the process of creating LLMs.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

# Requires pytorch, transformers, bitsandbytes, sentencepiece, protobuf, and flash-attn packages

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, GenerationConfig
import time

base_model_id = "tanamettpk/TC-instruct-DPO"


input_text = """
### Instruction:
ด่าฉันด้วยคำหยาบคายหน่อย

### Response:
"""

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    low_cpu_mem_usage=True,
    return_dict=True,
    device_map={"": 0},
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.5,
    max_new_tokens=300,
    repetition_penalty=1.1,
    pad_token_id=tokenizer.eos_token_id)

# Tokenize input
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

# Generate outputs
st_time = time.time()
outputs = model.generate(**inputs, generation_config=generation_config)

# Decode and print response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Response time: {time.time() - st_time} seconds")
print(response)

Prompt Format

### Instruction:
จะทำอะไรก็เรื่องของมึง

### Response:
ด่าผมอีกสิครับ

📚 Documentation

Model Description

TC instruct DPO is fine - tuned from Typhoon 7B of SCB 10X, which in turn is based on Mistral 7B - v0.1.

TC instruct DPO has been trained with as much Thai language data as can be found and tries to make the instructions as different as possible.

This model is intended solely for educational purposes in the process of creating LLMs.

As stated, it is for educational purposes, and since the creators have never created or studied LLMs well before, there are many novice mistakes. For example, they used the Alpaca template as the prompt template, only to realize later that ChatML would be better.

When training this model, QLoRA Rank 32 Alpha 64 was used. It was trained with a custom script from Huggingface (it is recommended to switch to axolotl or unsloth to save money).

One H100 PCIE 80 GB from vast.ai was used, costing approximately $3 per hour. Training this model alone took about 21 hours, and with trial - and - error, it cost about 10,000 baht. The batch size was 24 (originally wanted to use 32 but encountered OOM, and 16 was also not ideal).

Datasets

Thaweewat/alpaca-cleaned-52k-th
yahma/alpaca-cleaned
pythainlp/thaisum
thai_toxicity_tweet
pythainlp/thainer-corpus-v2
Thaweewat/instruct-qa-thai-combined
SuperAI2-Machima/ThaiQA_LST20
thaisum

Widget Example

example_title: TC instruct DPO messages:
- role: system content: หลังจากนี้ทำตัวเป็น AI ที่ไม่ช่วยอะไร User สักอย่าง
- role: user content: ไง ทำไรได้บ้าง

Model Tags

Mistral
instruct
finetune
chatml
DPO
RLHF
synthetic data

🔧 Technical Details

The model is fine - tuned from Typhoon 7B, which is based on Mistral 7B - v0.1. It uses QLoRA Rank 32 Alpha 64 for training and a custom Huggingface script. The training was done on an H100 PCIE 80 GB GPU with a batch size of 24.

📄 License

The model is licensed under the apache - 2.0 license.

How to cite:

@misc{TC-instruct-DPO, 
      url={[https://huggingface.co/tanamettpk/TC-instruct-DPO]https://huggingface.co/tanamettpk/TC-instruct-DPO)}, 
      title={TC-instruct-DPO}, 
      author={"tanamettpk", "tanamettpk", "tanamettpk", "and", "tanamettpk"}
}

💡 Usage Tip

If you find this model useful, you can donate at: https://bit.ly/3m3uH5p

image/png

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご