Zhi-writing-dsr1-14b Open-source Creative Writing Model - Optimized Fine-tuning Significantly Improves Writing Performance

Zhi Writing Dsr1 14b

Developed by Zhihu-ai

A creative writing enhancement model fine-tuned and optimized based on DeepSeek-R1-Distill-Qwen-14B, showing significant improvements in creative writing

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Creative Writing Enhancement #Multidisciplinary Knowledge Integration #DPO Optimization

Downloads 133

Release Time : 4/19/2025

Model Overview

A large language model focused on enhancing creative writing capabilities, significantly improving literary creation while maintaining general abilities

Model Features

Creative Writing Enhancement

Scored 8.33 in LLM creative story writing evaluation, significantly outperforming the base model

Multidisciplinary Balance

Enhances creative writing performance while maintaining knowledge reasoning and mathematical capabilities

Optimized Training Methods

Employs curriculum learning strategies and Step-DPO/DPOP optimization methods to improve model performance

Model Capabilities

Creative text generation

Multi-style writing

Knowledge reasoning

Mathematical computation

Multi-turn dialogue

Use Cases

Literary Creation

Style Imitation Writing

Creating works imitating specific authors' styles

Can accurately mimic the writing style of authors like Lu Xun

Ad Copy Generation

Generating marketing ad copies

Achieved a writing score of 8.46 in the advertising and marketing field

Educational Assistance

Writing Teaching Aid

Providing writing examples and guidance for students

Scored 8.2 in the education field for writing

🚀 Zhi-writing-dsr1-14b

Zhi-writing-dsr1-14b is a fine - tuned model based on DeepSeek - R1 - Distill - Qwen - 14B. It's specifically optimized to enhance creative writing capabilities, and has shown improved performance in several benchmark evaluations.

📄 License

The project is licensed under the Apache - 2.0 license.

📊 Datasets

The model is trained on the following datasets:

Congliu/Chinese - DeepSeek - R1 - Distill - data - 110k
cognitivecomputations/dolphin - r1
open - thoughts/OpenThoughts - 114k
qihoo360/Light - R1 - SFTData
qihoo360/Light - R1 - DPOData

🌐 Languages

The model supports the following languages:

Chinese (zh)
English (en)

🧠 Base Model

The base model is deepseek - ai/DeepSeek - R1 - Distill - Qwen - 14B.

🏷️ Tags

The model is tagged with "qwen2".

📚 Library Name

The library used is "transformers".

🚀 Quick Start

Prerequisites

Zhi - writing - dsr1 - 14b can be deployed on various hardware configurations, including GPUs with 80GB memory, a single H20/A800/H800, or dual RTX 4090. Additionally, the INT4 quantized version Zhi - writing - dsr1 - 14b - gptq - int4 can be deployed on a single RTX 4090.

Deployment

You can deploy the model using different frameworks as shown in the "💻 Usage Examples" section.

✨ Features

Enhanced Creative Writing: Zhi - writing - dsr1 - 14b has significantly improved creative writing capabilities compared to its base model. In the LLM Creative Story - Writing Benchmark, it achieved a score of 8.33 compared to the base model's 7.8. In the [WritingBench](https://github.com/X - PLUG/WritingBench) evaluation framework, it scored 8.46, an improvement over DeepSeek - R1 - Distill - Qwen - 14B's 7.93.
General Capability Improvement: Evaluations show modest improvements of 2%–5% in knowledge and reasoning tasks (CMMLU, MMLU - Pro), and encouraging progress in mathematical reasoning as measured by benchmarks such as AIME - 2024, AIME - 2025, and GSM8K.

📦 Installation

The installation mainly involves setting up the environment and downloading the model. You can use different frameworks to run the model. For example, using transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

MODEL_NAME = "Zhihu-ai/Zhi-writing-dsr1-14b"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    trust_remote_code=True
).eval()

# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.
# model.generation_config = GenerationConfig.from_pretrained(MODEL_NAME, trust_remote_code=True)

generate_configs = {
    "temperature": 0.6,
    "do_sample": True,
    "top_p": 0.95,
    "max_new_tokens": 4096
}

prompt = "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    **generate_configs
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

MODEL_NAME = "Zhihu-ai/Zhi-writing-dsr1-14b"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    trust_remote_code=True
).eval()

generate_configs = {
    "temperature": 0.6,
    "do_sample": True,
    "top_p": 0.95,
    "max_new_tokens": 4096
}

prompt = "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    **generate_configs
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Advanced Usage

You can also use other frameworks like ZhiLight, vllm, SGLang, and ollama to run the model.

Using ZhiLight

docker run -it --net=host --gpus='"device=0"' -v /path/to/model:/mnt/models --entrypoints="" ghcr.io/zhihu/zhilight/zhilight:0.4.17-cu124 python -m zhilight.server.openai.entrypoints.api_server --model-path /mnt/models --port 8000 --enable-reasoning --reasoning-parser deepseek-r1 --served-model-name Zhi-writing-dsr1-14b

curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Zhi-writing-dsr1-14b",
        "prompt": "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章",
        "max_tokens": 4096,
        "temperature": 0.6,
        "top_p": 0.95
    }'

Using vllm

# install vllm
pip install vllm>=0.6.4.post1

# huggingface model id
vllm serve Zhihu-ai/Zhi-writing-dsr1-14b --served-model-name Zhi-writing-dsr1-14b --port 8000

# local path
vllm serve /path/to/model  --served-model-name Zhi-writing-dsr1-14b --port 8000

curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Zhi-writing-dsr1-14b",
        "prompt": "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章",
        "max_tokens": 4096,
        "temperature": 0.6,
        "top_p": 0.95
    }'

Using SGLang

# install SGLang
pip install "sglang[all]>=0.4.5" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python

# huggingface model id 
python -m sglang.launch_server --model-path Zhihu-ai/Zhi-writing-dsr1-14b --served-model-name Zhi-writing-dsr1-14b --port 8000

# local path
python -m sglang.launch_server --model-path /path/to/model  --served-model-name Zhi-writing-dsr1-14b --port 8000

# send request
curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Zhi-writing-dsr1-14b",
        "prompt": "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章",
        "max_tokens": 4096,
        "temperature": 0.6,
        "top_p": 0.95
    }'

Using ollama

You can download ollama using this

quantization: Q4_K_M

ollama run zhihu/zhi-writing-dsr1-14b

bf16

ollama run zhihu/zhi-writing-dsr1-14b:bf16

📚 Documentation

Training Process

Data

The model's training corpus consists of three main data sources: rigorously filtered open - source datasets, chain - of - thought reasoning corpora, and curated question - answer pairs from Zhihu. To ensure optimal domain coverage, the distribution of various datasets was carefully balanced, including [Dolphin - r1](https://huggingface.co/datasets/cognitivecomputations/dolphin - r1), [Congliu/Chinese - DeepSeek - R1 - Distill - data - 110k](https://huggingface.co/datasets/Congliu/Chinese - DeepSeek - R1 - Distill - data - 110k), [OpenThoughts - 114k](https://huggingface.co/datasets/open - thoughts/OpenThoughts - 114k), [Light - R1 - SFTData](https://huggingface.co/datasets/qihoo360/Light - R1 - SFTData), and [Light - R1 - DPOData](https://huggingface.co/datasets/qihoo360/Light - R1 - DPOData), along with high - quality content from Zhihu. All datasets went through a comprehensive quality assurance process using the Reward Model (RM) filtering pipeline.

Training

Supervised Fine - Tuning (SFT): A curriculum learning strategy was employed for supervised fine - tuning. This approach systematically enhances creative writing capabilities while incorporating diverse domain data to maintain core competencies and mitigate catastrophic forgetting.
Direct Preference Optimization (DPO): For scenarios with minimal edit distances, Step - DPO (arxiv:2406.18629) was used to selectively penalize incorrect tokens, and positive constraints were incorporated into the loss function as proposed in DPOP (arXiv:2402.13228).

Evaluation Results

The evaluation results show promising improvements in the model's creative writing capabilities. In the LLM Creative Story - Writing Benchmark, it achieved a score of 8.33, an improvement from the base model's 7.87. On WritingBench, a comprehensive framework for evaluating large language model writing abilities, the model attained a score of 8.46, close to DeepSeek - R1's performance and better than DeepSeek - R1 - Distill - Qwen - 14B's score of 7.93.

In terms of general capabilities, evaluations indicate modest improvements of 2%–5% in knowledge and reasoning tasks (CMMLU, MMLU - Pro), and encouraging progress in mathematical reasoning as measured by benchmarks such as AIME - 2024, AIME - 2025, and GSM8K. The results suggest that the model maintains a balanced performance profile, with improvements in creative writing, knowledge/reasoning, and mathematical tasks compared to DeepSeek - R1 - Distill - Qwen - 14B, making it potentially suitable for a range of general - purpose applications.

general

Figure 2: When evaluating model performance, it is recommended to conduct multiple tests and average the results. (We use `n = 16` and `max_tokens = 32768` for mathematical tasks and `n = 2` for others)

Usage Recommendations

Set the temperature within the range of 0.5 - 0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
When evaluating model performance, it is recommended to conduct multiple tests and average the results. (Use n = 16 and max_tokens = 32768 for mathematical tasks and n = 2 for others)
To ensure that the model engages in thorough reasoning like DeepSeek - R1 series models, it is recommended to enforce the model to start its response with "<think>\n" at the beginning of every output.

🔧 Technical Details

The model uses a curriculum learning strategy for supervised fine - tuning and Step - DPO and DPOP for direct preference optimization. These techniques help improve the model's creative writing capabilities and general performance.

📄 Citation

@misc{Zhi-writing-dsr1-14b,
      title={Zhi-writing-dsr1-14b: Curriculum Reinforcement and Direct Preference Optimization for Robust Creative Writing in LLMs}, 
      author={Jiewu Wang, Xu Chen, Wenyuan Su, Chao Huang, Hongkui Gao, Lin Feng, Shan Wang, Lu Xu, Penghe Liu, Zebin Ou},
      year={2025},
      eprint={},
      archivePrefix={},
      url={https://huggingface.co/Zhihu-ai/Zhi-writing-dsr1-14b}, 
}

📞 Contact

If you have any questions, please raise an issue or contact us at ai@zhihu.com.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご