🚀 VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb
本项目名为VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb,引入了一种名为VeriReason的全新方法,利用带有测试平台反馈的强化学习来提升预训练模型在Verilog RTL代码生成方面的性能,为自动化RTL合成树立了新的标杆。
🚀 快速开始
若想了解实现细节,请访问我们的GitHub仓库:VeriReason 和我们的 项目页面。
查看我们的论文:VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
✨ 主要特性
- 本研究引入的VeriReason方法,结合了监督微调与引导奖励近端优化(GRPO)强化学习,专门针对RTL代码生成进行了优化。
- 利用精心策划的高质量训练示例和基于反馈的奖励模型,VeriReason在VerilogEval Machine基准测试中实现了83.1%的功能正确性,显著优于同等规模的模型以及像GPT - 4 Turbo这样的大型商业系统。
- 该模型将显式推理能力与强化学习相结合用于Verilog生成,基于Code Llama的7B参数模型与基线方法相比,首次尝试的功能正确性提高了多达2.8倍,并且对未见设计表现出强大的泛化能力。
📦 安装指南
要安装此项目,请按照以下步骤操作:
- 克隆仓库:
git clone https://github.com/NellyW8/VeriReason.git
- 进入项目目录:
cd VeriReason
- 按照仓库中的说明安装依赖项
💻 使用示例
基础用法
你可以使用transformers
库来使用该模型:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "Nellyw888/VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model.eval()
prompt = """
Please act as a professional verilog designer. Develop a module that implements a 8-bit comparator. The module should have two 8-bit inputs and one output. If the first input is greater than the second input, the output should be high. Otherwise, the output should be low. First, think through the design approach, considering the functionality, inputs, outputs, and implementation details. Then provide the complete Verilog code implementation. Respond in the following format: <think>
...
</think>
<answer>
```verilog
...```
</answer>
"""
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=1024, temperature=0.2, top_p=0.95)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
🔧 技术细节
训练细节
GRPO(基于偏好优化的生成式强化学习)训练基于OpenR1框架。进行GRPO训练的步骤如下:
- 将必要的文件移动到OpenR1目录:
mv verilog_rewards_tb.py verilog_train_tb.py src/open-r1/
- 创建一个用于Verilog配方的目录:
mkdir verilog_recipe
mv verilog_grpo_tb.yaml verilog_recipe/
- 运行训练:
NCCL_DEBUG=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL CUDA_VISIBLE_DEVICES=0,1,2 ACCELERATE_USE_NCCL=1 accelerate launch --config_file recipes/accelerate_configs/zero3.yaml --num_processes=3 src/open_r1/verilog_train_rtlcoder.py --config verilog_recipe/verilog_grpo_tb.yaml --use_vllm=false
📄 许可证
文档中未提及许可证相关信息。
📚 详细文档
更新日志
2025.05.17:VeriReason - codeLlama - 7b - RTLCoder - Verilog - GRPO - reasoning - tb首次发布。
引用信息
如果你使用了我们的模型或数据集,请引用我们的论文:
@misc{wang2025verireason,
title={VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation},
author={Yiting Wang and Guoheng Sun and Wanghao Ye and Gang Qu and Ang Li},
year={2025},
eprint={2505.11849},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2505.11849},
}
致谢
本仓库受益于OpenR1和LLamaFactory。