VeriReason开源Verilog RTL代码生成模型 - 结合强化学习提升硬件设计性能

首页

Verireason Codellama 7b RTLCoder Verilog GRPO Reasoning Tb

由 Nellyw888 开发

VeriReason是一种结合强化学习与测试平台反馈的Verilog RTL代码生成方法，显著提升了预训练模型在硬件设计领域的性能。

大型语言模型

Transformers

#Verilog代码生成 #强化学习优化 #测试平台反馈

下载量 1,483

发布时间 : 5/13/2025

模型简介

该模型专注于Verilog RTL代码生成，通过结合监督微调与引导奖励近端优化（GRPO）强化学习，实现了高质量的硬件设计自动化。

模型特点

强化学习优化

采用GRPO（引导奖励近端优化）强化学习方法，结合测试平台反馈优化模型性能

高功能正确率

在VerilogEval Machine基准测试中达到83.1%的功能正确性，优于同类模型

显式推理能力

将显式推理能力与强化学习相结合，显著提升首次尝试的功能正确性

强大泛化能力

对未见设计表现出强大的泛化能力

模型能力

Verilog代码生成

硬件设计自动化

RTL合成

设计验证

使用案例

硬件设计

8位比较器设计

根据规格要求自动生成8位比较器的Verilog代码

生成功能正确的Verilog实现

复杂电路设计

自动化生成复杂数字电路的RTL代码

提高设计效率和正确率

🚀 VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb

本项目名为VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb，引入了一种名为VeriReason的全新方法，利用带有测试平台反馈的强化学习来提升预训练模型在Verilog RTL代码生成方面的性能，为自动化RTL合成树立了新的标杆。

🚀 快速开始

若想了解实现细节，请访问我们的GitHub仓库：VeriReason 和我们的项目页面。

查看我们的论文：VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation

✨ 主要特性

本研究引入的VeriReason方法，结合了监督微调与引导奖励近端优化（GRPO）强化学习，专门针对RTL代码生成进行了优化。
利用精心策划的高质量训练示例和基于反馈的奖励模型，VeriReason在VerilogEval Machine基准测试中实现了83.1%的功能正确性，显著优于同等规模的模型以及像GPT - 4 Turbo这样的大型商业系统。
该模型将显式推理能力与强化学习相结合用于Verilog生成，基于Code Llama的7B参数模型与基线方法相比，首次尝试的功能正确性提高了多达2.8倍，并且对未见设计表现出强大的泛化能力。

📦 安装指南

要安装此项目，请按照以下步骤操作：

克隆仓库：git clone https://github.com/NellyW8/VeriReason.git
进入项目目录：cd VeriReason
按照仓库中的说明安装依赖项

💻 使用示例

基础用法

你可以使用transformers库来使用该模型：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Nellyw888/VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model.eval()

prompt = """
Please act as a professional verilog designer. Develop a module that implements a 8-bit comparator. The module should have two 8-bit inputs and one output. If the first input is greater than the second input, the output should be high. Otherwise, the output should be low. First, think through the design approach, considering the functionality, inputs, outputs, and implementation details. Then provide the complete Verilog code implementation. Respond in the following format: <think>
...
</think>
<answer>
```verilog
...```
</answer>
"""

input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=1024, temperature=0.2, top_p=0.95)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

🔧 技术细节

训练细节

GRPO（基于偏好优化的生成式强化学习）训练基于OpenR1框架。进行GRPO训练的步骤如下：

mv verilog_rewards_tb.py verilog_train_tb.py src/open-r1/

mkdir verilog_recipe
mv verilog_grpo_tb.yaml verilog_recipe/

运行训练：

NCCL_DEBUG=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL CUDA_VISIBLE_DEVICES=0,1,2 ACCELERATE_USE_NCCL=1 accelerate launch --config_file recipes/accelerate_configs/zero3.yaml --num_processes=3 src/open_r1/verilog_train_rtlcoder.py --config verilog_recipe/verilog_grpo_tb.yaml --use_vllm=false

📄 许可证

文档中未提及许可证相关信息。

📚 详细文档

更新日志

2025.05.17：VeriReason - codeLlama - 7b - RTLCoder - Verilog - GRPO - reasoning - tb首次发布。

引用信息

如果你使用了我们的模型或数据集，请引用我们的论文：

@misc{wang2025verireason,
      title={VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation}, 
      author={Yiting Wang and Guoheng Sun and Wanghao Ye and Gang Qu and Ang Li},
      year={2025},
      eprint={2505.11849},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.11849}, 
}