VeriReason開源Verilog RTL代碼生成模型 - 結合強化學習提升硬件設計性能

首頁

Verireason Codellama 7b RTLCoder Verilog GRPO Reasoning Tb

由Nellyw888開發

VeriReason是一種結合強化學習與測試平臺反饋的Verilog RTL代碼生成方法，顯著提升了預訓練模型在硬件設計領域的性能。

大型語言模型

Transformers

#Verilog代碼生成 #強化學習優化 #測試平臺反饋

下載量 1,483

發布時間 : 5/13/2025

模型概述

該模型專注於Verilog RTL代碼生成，通過結合監督微調與引導獎勵近端優化（GRPO）強化學習，實現了高質量的硬件設計自動化。

模型特點

強化學習優化

採用GRPO（引導獎勵近端優化）強化學習方法，結合測試平臺反饋優化模型性能

高功能正確率

在VerilogEval Machine基準測試中達到83.1%的功能正確性，優於同類模型

顯式推理能力

將顯式推理能力與強化學習相結合，顯著提升首次嘗試的功能正確性

強大泛化能力

對未見設計表現出強大的泛化能力

模型能力

Verilog代碼生成

硬件設計自動化

RTL合成

設計驗證

使用案例

硬件設計

8位比較器設計

根據規格要求自動生成8位比較器的Verilog代碼

生成功能正確的Verilog實現

複雜電路設計

自動化生成複雜數字電路的RTL代碼

提高設計效率和正確率

🚀 VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb

本項目名為VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb，引入了一種名為VeriReason的全新方法，利用帶有測試平臺反饋的強化學習來提升預訓練模型在Verilog RTL代碼生成方面的性能，為自動化RTL合成樹立了新的標杆。

🚀 快速開始

若想了解實現細節，請訪問我們的GitHub倉庫：VeriReason 和我們的項目頁面。

查看我們的論文：VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation

✨ 主要特性

本研究引入的VeriReason方法，結合了監督微調與引導獎勵近端優化（GRPO）強化學習，專門針對RTL代碼生成進行了優化。
利用精心策劃的高質量訓練示例和基於反饋的獎勵模型，VeriReason在VerilogEval Machine基準測試中實現了83.1%的功能正確性，顯著優於同等規模的模型以及像GPT - 4 Turbo這樣的大型商業系統。
該模型將顯式推理能力與強化學習相結合用於Verilog生成，基於Code Llama的7B參數模型與基線方法相比，首次嘗試的功能正確性提高了多達2.8倍，並且對未見設計表現出強大的泛化能力。

📦 安裝指南

要安裝此項目，請按照以下步驟操作：

克隆倉庫：git clone https://github.com/NellyW8/VeriReason.git
進入項目目錄：cd VeriReason
按照倉庫中的說明安裝依賴項

💻 使用示例

基礎用法

你可以使用transformers庫來使用該模型：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Nellyw888/VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model.eval()

prompt = """
Please act as a professional verilog designer. Develop a module that implements a 8-bit comparator. The module should have two 8-bit inputs and one output. If the first input is greater than the second input, the output should be high. Otherwise, the output should be low. First, think through the design approach, considering the functionality, inputs, outputs, and implementation details. Then provide the complete Verilog code implementation. Respond in the following format: <think>
...
</think>
<answer>
```verilog
...```
</answer>
"""

input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=1024, temperature=0.2, top_p=0.95)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

🔧 技術細節

訓練細節

GRPO（基於偏好優化的生成式強化學習）訓練基於OpenR1框架。進行GRPO訓練的步驟如下：

mv verilog_rewards_tb.py verilog_train_tb.py src/open-r1/

mkdir verilog_recipe
mv verilog_grpo_tb.yaml verilog_recipe/

運行訓練：

NCCL_DEBUG=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL CUDA_VISIBLE_DEVICES=0,1,2 ACCELERATE_USE_NCCL=1 accelerate launch --config_file recipes/accelerate_configs/zero3.yaml --num_processes=3 src/open_r1/verilog_train_rtlcoder.py --config verilog_recipe/verilog_grpo_tb.yaml --use_vllm=false

📄 許可證

文檔中未提及許可證相關信息。

📚 詳細文檔

更新日誌

2025.05.17：VeriReason - codeLlama - 7b - RTLCoder - Verilog - GRPO - reasoning - tb首次發佈。

引用信息

如果你使用了我們的模型或數據集，請引用我們的論文：

@misc{wang2025verireason,
      title={VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation}, 
      author={Yiting Wang and Guoheng Sun and Wanghao Ye and Gang Qu and Ang Li},
      year={2025},
      eprint={2505.11849},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.11849}, 
}