🚀 CodeV-R1-Distill-Qwen-7B
The paper is coming soon! This model is a significant advancement in adapting large language models for hardware description languages. It distills knowledge from DeepSeek-R1, offering enhanced reasoning abilities and better performance in Verilog benchmarks.
🚀 Quick Start
CodeV-R1-Distill-Qwen-7B can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using vLLM:
vllm serve zhuyaoyu/CodeV-R1-Distill-Qwen-7B --tensor-parallel-size 2 --max-model-len 16384 --enforce-eager
✨ Features
Introduction
The post-training phase of large language models (LLMs) has advanced rapidly, with models like OpenAI’s GPT-o1, DeepSeek-R1, and Kimi-1.5 showcasing remarkable reasoning capabilities. While these advancements have primarily targeted software programming languages, there is growing interest in adapting LLMs for hardware description languages (HDLs). We propose leveraging knowledge distillation to equip smaller, efficient models with DeepSeek-R1-like reasoning abilities. Our CodeV-R1-Distill-Qwen-7B outperforms prior non-reasoning LLMs across major Verilog benchmarks, demonstrating superior code synthesis and problem-solving capabilities.
Model Summary
- Data Preparation: We re-summarize and formulate questions from the original CodeV dataset using Deepseek-v3. After filtering out straightforward problems and those with non-synthesizable issues, approximately 87,000 (problem, code) pairs remain.
- Training: We employ LLaMAFactory to apply supervised fine-tuning (SFT) to Qwen2.5-Coder-7B-Instruct using the refined dataset. Training is conducted over six epochs with a learning rate of 1e-5 and a batch size of 64.
Evaluation Results
VerilogEval (v2)
Model |
Model size |
Type |
Spec-to-rtl |
Completion |
GPT-4o |
Undisclosed |
General |
62.5% |
59.0% |
GPT-4 Turbo |
Undisclosed |
General |
61.1% |
53.9% |
GPT-4 |
Undisclosed |
General |
32.0% |
42.3% |
Mistral Large |
Undisclosed |
General |
37.5% |
34.0% |
Llama3.1 |
405B |
General |
57.2% |
56.4% |
Llama3.1 |
70B |
General |
42.8% |
35.3% |
Llama3 |
70B |
General |
43.9% |
37.8% |
Llama2 |
70B |
General |
5.3% |
1.3% |
Llama3.1 |
8B |
General |
19.1% |
2.6% |
CodeLlama |
70B |
Coding |
34.9% |
37.2% |
DeepSeek Coder |
33B |
Coding |
21.7% |
25.0% |
CodeGemma |
7B |
Coding |
9.5% |
8.3% |
DeepSeek Coder |
6.7B |
Coding |
29.6% |
24.4% |
RTL-Coder |
6.7B |
Verilog RTL |
36.8% |
35.9% |
CodeV-R1-distill (ours) |
7B |
Verilog RTL |
65.4% |
65.1% |
RTLLM (v1.1)
Model |
Model size |
Type |
Pass@1 |
GPT-4o |
Undisclosed |
General |
33.8% |
GPT-3.5 Turbo |
Undisclosed |
General |
28.3% |
Llama3.1 |
405B |
General |
38.9% |
Nemotron-4 |
340B |
General |
18.9% |
Llama3.1 |
8B |
General |
19.1% |
CodeLlama |
7B |
Coding |
17.9% |
CodeQwen |
7B |
Coding |
24.1% |
Starcoder2 |
15B |
Coding |
15.5% |
DeepSeek Coder |
6.7B |
Coding |
23.1% |
DeepSeek-Coder-V2 |
16B |
Coding |
33.1% |
DeepSeek-Coder-V2 |
236B |
Coding |
34.5% |
RTL-Coder |
6.7B |
Verilog RTL |
36.8% |
CraftRTL |
6.7B |
Verilog RTL |
53.1% |
CodeV-R1-distill (ours) |
7B |
Verilog RTL |
56.2% |
Math
Model |
AIME |
Math |
AMC |
Minerva |
Olympiad Bench |
Average |
Qwen2.5-7b-instruct-1M |
11.25% |
72.61% |
41.11% |
25.92% |
34.66% |
37.11% |
Qwen2.5-math-7b-instruct |
12.08% |
82.25% |
49.4% |
27.64% |
37.31% |
41.74% |
Qwen2.5-coder-7b-instruct (baseline) |
5.63% |
63.5% |
35.62% |
21.02% |
28.64% |
30.88% |
CodeV-R1-distill (ours) |
11.04% |
74.35% |
45.86% |
25.79% |
38.7% |
39.15% |
💡 Usage Tip
During training and evaluation, we use a system prompt:
You are a helpful assistant. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and<answer> </answer> tags, respectively, i.e., <think> reasoning process here </think><answer> answer here </answer>. Now the user asks you to write verilog code. After thinking, when you finally reach a conclusion, enclose the final verilog code in ```verilog ``` within <answer> </answer> tags. i.e., <answer> ```verilog\n module top_module(in, out, ...) ... ``` </answer>.\n
It is recommended to use this prompt.
📄 License
CodeV-R1-Distill-Qwen-7B is derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 87k samples curated with DeepSeek-R1.
📚 Citation
@misc{CodeV-R1-Distill-Qwen-7B,
author = {IPRC-DIP},
title = {CodeV Model Distilled from DeepSeek-R1},
url = {https://huggingface.co/zhuyaoyu/CodeV-R1-Distill-Qwen-7B},
year = {2025}
}