Tanuki-8x8B-dpo-v1.0 Open-Source Language Model - Free Deployment for High-Quality Conversational Communication

Tanuki 8x8B Dpo V1.0

Developed by weblab-GENIAC

Tanuki-8x8B is a large-scale language model pretrained from scratch, optimized for dialogue tasks through SFT and DPO

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Japanese-optimized dialogue #Mixture of Experts architecture #1.7T pretraining

Downloads 217

Release Time : 8/12/2024

Model Overview

Tanuki-8x8B-dpo-v1.0 is a large-scale language model with 8x8B parameters (total ~47B parameters, active ~13B parameters), pretrained on approximately 1.7T tokens, specifically optimized for Japanese and English dialogue tasks.

Model Features

Mixture of Experts architecture

Adopts an 8x8B Mixture of Experts architecture with ~47B total parameters but only ~13B active parameters, balancing performance and efficiency

Japanese optimization

Specifically optimized for Japanese dialogue tasks, supporting high-quality Japanese text generation

DPO optimization

Improved dialogue quality through Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO)

Multi-quantization support

Provides various quantization versions including AWQ and GPTQ for easy deployment in different hardware environments

Model Capabilities

Japanese text generation

English text generation

Multi-turn dialogue

Task-oriented dialogue

Use Cases

Intelligent assistant

Japanese Q&A system

Building intelligent Q&A assistants for Japanese users

Performed excellently in human evaluations

Education

Japanese learning assistance

Helping Japanese learners with language practice

🚀 Tanuki-8x8B-dpo-v1.0

Tanuki-8x8B-dpo-v1.0 is a large - scale language model fine - tuned for dialogue. It offers high - quality language interaction capabilities and has been evaluated through various benchmarks.

🚀 Quick Start

Prerequisites

The inference of this model requires flash attention. Install it as follows:

pip install --no-build-isolation flash_attn

Inference with HuggingFace Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model = AutoModelForCausalLM.from_pretrained("weblab-GENIAC/Tanuki-8x8B-dpo-v1.0", device_map="auto", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("weblab-GENIAC/Tanuki-8x8B-dpo-v1.0")
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

messages = [
    {"role": "system", "content": "The following is an instruction that describes a task. Write a response that appropriately meets the requirements."},
    {"role": "user", "content": "Can a raccoon dog understand the Critique of Pure Reason?"}
]

input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
output_ids = model.generate(input_ids,
                            max_new_tokens=1024,
                            temperature=0.5,
                            streamer=streamer)

Inference with vLLM

When using vLLM for inference, it is necessary to adapt to the custom architecture. Build the modified vLLM from here as follows:

git clone https://github.com/team-hatakeyama-phase2/vllm.git
cd vllm
LD_LIBRARY_PATH="" MAX_JOBS=16 pip install -e .

from time import time
from vllm import LLM, SamplingParams

model_name = "weblab-GENIAC/Tanuki-8x8B-dpo-v1.0"
# model_name = "team-hatakeyama-phase2/Tanuki-8x8B-dpo-v1.0-AWQ"
# model_name = "team-hatakeyama-phase2/Tanuki-8x8B-dpo-v1.0-GPTQ-4bit"
# model_name = "team-hatakeyama-phase2/Tanuki-8x8B-dpo-v1.0-GPTQ-8bit"

# vllm = LLM(model_name, trust_remote_code=True, tensor_parallel_size=1)  # For 1 GPU
vllm = LLM(model_name, trust_remote_code=True, tensor_parallel_size=2)  # For 2 GPUs
tokenizer = vllm.get_tokenizer()

messages = [
    {"role": "system", "content": "The following is an instruction that describes a task. Write a response that appropriately meets the requirements."},
    {"role": "user", "content": "Can a raccoon dog understand the Critique of Pure Reason?"}
]

inputs_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(f"inputs_text: {inputs_text}")

sampling_params = SamplingParams(temperature=0.0, max_tokens=1024, seed=1, repetition_penalty=1.1)
start = time()
outputs = vllm.generate(inputs_text, sampling_params=sampling_params, use_tqdm=False)
end = time()
outputs_text = outputs[0].outputs[0].text
print(f"outputs_text: {outputs_text}")
print(f"Elapsed time: {(end - start):.4f} sec.")

✨ Features

Tanuki-8x8B is a large - scale language model with 8x8B parameters (approximately 47B total parameters and about 13B active parameters), which has been pre - trained on approximately 1.7T tokens from scratch. Tanuki-8x8B-dpo-v1.0 has been fine - tuned for dialogue through SFT and DPO.

📦 Quantized Models

AWQ 4bit Quantization
GPTQ 4bit Quantization
GPTQ 8bit Quantization
GGUF Quantization
The GGUF version may have performance degradation and is not recommended.

📚 Documentation

Prompt Format

Tanuki-8x8B-dpo-v1.0 uses the Japanese Alpaca prompt format.

Single - turn

<s>The following is an instruction that describes a task. Write a response that appropriately meets the requirements.

### Instruction:
Can a raccoon dog understand the Critique of Pure Reason?

### Response:

Multi - turn

<s>The following is an instruction that describes a task. Write a response that appropriately meets the requirements.

### Instruction:
{Input for the first turn}

### Response:
{Response for the first turn}</s>

### Instruction:
{Input for the second turn}

### Response:

It is recommended to use the default system prompt "The following is an instruction that describes a task. Write a response that appropriately meets the requirements." because the model has not been trained on other system prompts. Please describe the details of the task in the user prompt.

Benchmarks

Human Evaluation

A system simulating Chatbot Arena was created, and a manual blind test was conducted. (For more details, see here)
All evaluation data (approximately 2000 cases) is publicly available. image/png

Japanese MT - Bench

Evaluation by GPT - 4 (gpt - 4 - 0613, scores of - 1 are excluded when calculating the average score)

	Tanuki-8B-dpo-v1.0	Tanuki-8x8B-dpo-v1.0
Average Score	7.24	7.96
Coding	5.4	6.75
Extraction	6.65	6.90
Humanities	9.1	9.3
Math	3.9	5.75
Reasoning	5.75	7.35
Role - play	8.75	8.95
STEM	9.35	9.40
Writing	9.05	8.85

👥 Development Team

Kanehisa Hatakeyama [Leader], asaoka_tadashi, Atsushi Saito, Chattso - GPT, Chihiro Arata, Chihiro HIGUCHI, Daichi Kohmoto, Esty, Hideaki Hayashi, hiroaki shioya, Issei Fujimoto, Jie Zeng, Jinsei Shiraishi, K. Nishizawa, Kazutaka Nishimae, Kunihiro Watanabe, masaki okamura, Minami Someya, Mr. M, Nishi, Nishijima, p1atdev, Rumi Nakagawa, Ryota Mitsuhashi, Susumu Ota, takagi, Toshio Nishida, y_morinaga, Yuki Namiuchi, Yukie Kawano, Tsuneji Nagahara, Jun Kato, Atsushi Kawagoe, Kenta Iwata, Mitsuho Kikuchi, Masato Kumada, Shota Eguni, Toshiyuki Sano, Hiroki Yamaguchi, Yasutaka Nishiie, Masaharu Kawamura, Shun Katakami, Shiso Horie, Kanta Hayashi

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご