DeepSeek-R1-Distill-Qwen-7B-Japanese Open-source Model - Precise Japanese Inference Response Prompts

Deepseek R1 Distill Qwen 7B Japanese

Developed by lightblue

This is the Japanese version of the DeepSeek R1 model, specifically fine-tuned for Japanese reasoning tasks and capable of reliably and accurately responding to prompts in Japanese.

Large Language Model

Transformers

JapaneseOpen Source License:Apache-2.0 #Japanese reasoning optimization #Mathematical problem solving #Thought chain generation

Downloads 1,067

Release Time : 1/24/2025

Model Overview

This model is a fine-tuned version of DeepSeek-R1-Distill-Qwen-7B on a Japanese reasoning dataset, which solves the problem of inconsistent output of the original model under Japanese prompts.

Model Features

Japanese optimization

Specifically fine-tuned for Japanese, solving the problem of inconsistent output of the original model under Japanese prompts

Efficient training

Trained on Alibaba Cloud 8 x L20 instances in less than 10 minutes

Reasoning ability

Retains the excellent reasoning ability of the original model, especially suitable for solving mathematical and logical problems

Output consistency

More stable and reliable in Japanese output compared to the original model

Model Capabilities

Japanese text generation

Mathematical reasoning

Logical problem solving

Multi-round dialogue

Use Cases

Education

Mathematical problem solving

Solve Japanese mathematical problems, especially those requiring multi-step reasoning

Achieved 70% accuracy on the GSM8K Japanese test set

Customer service

Japanese customer consultation

Handle consultations and questions from Japanese customers

🚀 lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese

Deepseek's R1 models are excellent for reasoning but may output inconsistent languages. This Japanese version reliably outputs Japanese in response to prompts.

🚀 Quick Start

When using these models, we recommend using a sampling temperature of between 0.5 - 0.7, as per the original distilled R1 models.

Additionally, we have observed that the model sometimes tends to repeat itself more than the original R1 model, so we also recommend setting repetition_penalty to 1.1, or higher if the model repeats itself when processing your prompts.

✨ Features

Reliable Japanese Output: This model is a Japanese fine - tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B on the lightblue/distilabel-reasoning-R1-Llama-70B dataset, which can reliably and accurately output Japanese in response to prompts.

📦 Installation

To use this model in vLLM, you need to install vLLM first. You can install it using the following command:

pip install vllm

💻 Usage Examples

Basic Usage

from vllm import LLM, SamplingParams

llm = LLM(
    model="lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese",
    max_model_len=8_000
)

sampling_params = SamplingParams(
    temperature=0.5, 
    max_tokens=8_000,
    repetition_penalty=1.1
)

prompts = [
    """学校には1クラスにつき20人の生徒がおり、クラスは合計3つあります。
学校全体では男子と女子がそれぞれ50%ずついます。
1つ目のクラスには女子が15人、2つ目のクラスには女子が12人います。
3つ目のクラスには何人の男子がいますか？"""
]

conversations = [
    [{"role": "user", "content": x}] for x in prompts
]

outputs = llm.chat(conversations, sampling_params=sampling_params)

for output in outputs:
    print(output.outputs[0].text)

<think>
# まず、学校の総生徒数を算出します。各クラスに20人の生徒があり、クラスは3つあるため、総生徒数は60人です。

# 次に、学校全体で男子と女子は同じ人数で分布しています。したがって、男子と女子各有30人。
...
# したがって、3つ目のクラスの男子数は20 - 3 = 17人です。
# </think>

# **解答：**

# 学校の総生徒数を算出します。
...
# **最終的な答え：**
# \[
# \boxed{17}
# \]

📚 Documentation

Evaluation

We evaluated this model for output accuracy and the percentage of valid Japanese <think> sections using the first 50 rows of the SakanaAI/gsm8k-ja-test_250-1319 dataset.

We compare this to the original R1 model and test in both regimes where repetition penalty is 1.0 and 1.1:

	Repetition Penalty	Answer accuracy (%)	Valid Japanese `<think>` (%)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	1.0	60	94
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	1.1	62	96
lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese	1.0	66	92
lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese	1.1	70	98

Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found here.

We further use the first 50 prompts from DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja to evaluate the percentage of valid Japanese <think> sections in model responses. This benchmark contains more varied and complex prompts, meaning this is a more realistic evaluation of how reliably this model can output Japanese.

	Repetition Penalty	Valid Japanese `<think>` (%)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	1.0	48
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	1.1	48
lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese	1.0	84
lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese	1.1	94

Code for the DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja evaluation can be found here.

How this model was made

We made the data for this model using the following steps:

Sample English reasoning - style prompts from argilla/distilabel-reasoning-prompts.
Remove similar prompts using text similarity based on BAAI/bge-m3 embeddings.
Translate English prompts to Japanese using gpt-4o-mini-2024-07-18.
Generate answers to prompts using deepseek-ai/DeepSeek-R1-Distill-Llama-70B.
Filter out responses which did not:
- Finish within 2048 tokens
- Contain a valid <think> section
- Have the <think> section written in Japanese

Training details

Full training config

Training config yaml

### model
model_name_or_path: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: /root/LLaMA-Factory/examples/deepspeed/ds_z2_config.json

### dataset
dataset: distilabel-reasoning-R1-Llama-70B-ja-train
template: qwen
cutoff_len: 4500
overwrite_cache: true
preprocessing_num_workers: 16
packing: true

### output
output_dir: /root/train_outputs/DeepSeek-R1-Distill-Qwen-7B/distilabel-reasoning-R1-Llama-70B-ja-train
logging_steps: 1
save_steps: 0.99999
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 1.0e-5
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.01
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.01
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 0.1

Training run script

echo '{
  "distilabel-reasoning-R1-Llama-70B-ja-train": {
    "hf_hub_url": "lightblue/distilabel-reasoning-R1-Llama-70B-ja-train",
    "formatting": "sharegpt"
  }
}' > /root/LLaMA-Factory/data/dataset_info.json

cd /root/LLaMA-Factory && llamafactory-cli train /root/reasoning_train.yaml

rm -r /root/train_outputs/DeepSeek-R1-Distill-Qwen-7B/distilabel-reasoning-R1-Llama-70B-ja-train/checkpoint*
huggingface-cli upload lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese /root/train_outputs/DeepSeek-R1-Distill-Qwen-7B/distilabel-reasoning-R1-Llama-70B-ja-train

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 8
total_eval_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.01
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.766	0.1087	5	0.5912
0.5873	0.2174	10	0.5282
0.3868	0.3261	15	0.4958
0.5101	0.4348	20	0.4761
0.4085	0.5435	25	0.4644
0.5561	0.6522	30	0.4578
0.4683	0.7609	35	0.4542
0.5055	0.8696	40	0.4526
0.5359	0.9783	45	0.4519

Framework versions

Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

📄 License

We share this model under an Apache 2.0 license.

Developed by

This model was trained by Peter Devine (ptrdvn) for Lightblue

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご