TC-instruct-DPO开源泰语指令优化模型 - 基于台风7B微调满足多样需求

首页

TC Instruct DPO

由 tanamettpk 开发

基于台风7B微调的泰语指令优化模型，采用直接偏好优化(DPO)技术训练

大型语言模型

Transformers

支持多种语言开源协议:Apache-2.0 #泰语指令微调 #DPO强化学习 #QLoRA高效训练

下载量 28

发布时间 : 2/17/2024

模型简介

该模型是基于SCB 10X的台风7B(源自米斯特拉尔7B)微调而来的泰语指令优化模型，专为研究大语言模型构建流程而开发。采用QLoRA技术训练，支持多种泰语指令任务。

模型特点

泰语指令优化

专门针对泰语指令进行优化，确保指令的多样性

直接偏好优化(DPO)

采用直接偏好优化技术进行训练，提高模型响应质量

QLoRA高效微调

使用QLoRA技术(秩32，α值64)进行高效微调

模型能力

泰语文本生成

指令跟随

问答系统

使用案例

研究应用

大语言模型构建研究

用于研究泰语大语言模型的构建流程和技术

对话系统

泰语聊天机器人

可用于构建泰语对话系统

🚀 TC-instruct-DPO - Typhoon 7B

TC-instruct-DPO 是基于 Typhoon 7B 微调的模型，旨在为大语言模型（LLM）的创建过程提供学习参考，助力相关领域的研究与实践。

image/png

✨ 主要特性

多技术融合：融合了 Mistral、instruct、finetune、chatml、DPO、RLHF 等技术。
多语言支持：支持英语（en）和泰语（th）。
多数据集训练：使用了多个泰语数据集进行训练，包括 Thaweewat/alpaca-cleaned-52k-th、yahma/alpaca-cleaned 等。

📦 安装指南

文档未提供具体安装步骤，暂不展示。

💻 使用示例

基础用法

# Requires pytorch, transformers, bitsandbytes, sentencepiece, protobuf, and flash-attn packages

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, GenerationConfig
import time

base_model_id = "tanamettpk/TC-instruct-DPO"


input_text = """
### Instruction:
ด่าฉันด้วยคำหยาบคายหน่อย

### Response:
"""

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    low_cpu_mem_usage=True,
    return_dict=True,
    device_map={"": 0},
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.5,
    max_new_tokens=300,
    repetition_penalty=1.1,
    pad_token_id=tokenizer.eos_token_id)

# Tokenize input
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

# Generate outputs
st_time = time.time()
outputs = model.generate(**inputs, generation_config=generation_config)

# Decode and print response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Response time: {time.time() - st_time} seconds")
print(response)

📚 详细文档

模型描述

TC instruct DPO 是基于 SCB 10X 的 Typhoon 7B 进行微调的，而 Typhoon 7B 又源自 Mistral 7B - v0.1。

TC instruct DPO 尽可能地使用了泰语数据进行训练，并尽力使指令具有多样性。

该模型仅用于学习创建大语言模型（LLM）的过程。

由于这是首次尝试创建 LLM 且相关学习经验有限，在训练过程中存在一些不足，例如使用了 Alpaca template 作为提示模板，后来才发现使用 ChatML 更好。

训练该模型使用了 QLoRA Rank 32 Alpha 64，并使用了 Huggingface 的自定义脚本（建议使用 axolotl 或 unsloth，更节省成本）。

使用了 vast.ai 的 1 个 H100 PCIE 80 GB GPU 进行训练，每小时约 3 美元，仅训练该模型约需 21 小时，若算上试错成本约 10000 泰铢。

训练时的批量大小为 24（原本想使用 32，但会出现内存溢出问题，使用 16 效果也不佳）。

提示格式

### Instruction:
จะทำอะไรก็เรื่องของมึง

### Response:
ด่าผมอีกสิครับ

如何引用

@misc{TC-instruct-DPO, 
      url={[https://huggingface.co/tanamettpk/TC-instruct-DPO]https://huggingface.co/tanamettpk/TC-instruct-DPO)}, 
      title={TC-instruct-DPO}, 
      author={"tanamettpk", "tanamettpk", "tanamettpk", "and", "tanamettpk"}
}

📄 许可证

该模型使用的许可证为 apache-2.0。

捐赠提示

如果使用该模型有所帮助，欢迎捐赠：Tipme: https://bit.ly/3m3uH5p

信息表格

属性	详情
基础模型	scb10x/typhoon-7b
模型类型	TC-instruct-DPO
标签	Mistral、instruct、finetune、chatml、DPO、RLHF、synthetic data
支持语言	英语（en）、泰语（th）
训练数据集	Thaweewat/alpaca-cleaned-52k-th、yahma/alpaca-cleaned、pythainlp/thaisum、thai_toxicity_tweet、pythainlp/thainer-corpus-v2、Thaweewat/instruct-qa-thai-combined、SuperAI2-Machima/ThaiQA_LST20、thaisum
许可证	apache-2.0