🚀 VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb
本項目名為VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb,引入了一種名為VeriReason的全新方法,利用帶有測試平臺反饋的強化學習來提升預訓練模型在Verilog RTL代碼生成方面的性能,為自動化RTL合成樹立了新的標杆。
🚀 快速開始
若想了解實現細節,請訪問我們的GitHub倉庫:VeriReason 和我們的 項目頁面。
查看我們的論文:VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
✨ 主要特性
- 本研究引入的VeriReason方法,結合了監督微調與引導獎勵近端優化(GRPO)強化學習,專門針對RTL代碼生成進行了優化。
- 利用精心策劃的高質量訓練示例和基於反饋的獎勵模型,VeriReason在VerilogEval Machine基準測試中實現了83.1%的功能正確性,顯著優於同等規模的模型以及像GPT - 4 Turbo這樣的大型商業系統。
- 該模型將顯式推理能力與強化學習相結合用於Verilog生成,基於Code Llama的7B參數模型與基線方法相比,首次嘗試的功能正確性提高了多達2.8倍,並且對未見設計表現出強大的泛化能力。
📦 安裝指南
要安裝此項目,請按照以下步驟操作:
- 克隆倉庫:
git clone https://github.com/NellyW8/VeriReason.git
- 進入項目目錄:
cd VeriReason
- 按照倉庫中的說明安裝依賴項
💻 使用示例
基礎用法
你可以使用transformers
庫來使用該模型:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "Nellyw888/VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model.eval()
prompt = """
Please act as a professional verilog designer. Develop a module that implements a 8-bit comparator. The module should have two 8-bit inputs and one output. If the first input is greater than the second input, the output should be high. Otherwise, the output should be low. First, think through the design approach, considering the functionality, inputs, outputs, and implementation details. Then provide the complete Verilog code implementation. Respond in the following format: <think>
...
</think>
<answer>
```verilog
...```
</answer>
"""
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=1024, temperature=0.2, top_p=0.95)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
🔧 技術細節
訓練細節
GRPO(基於偏好優化的生成式強化學習)訓練基於OpenR1框架。進行GRPO訓練的步驟如下:
- 將必要的文件移動到OpenR1目錄:
mv verilog_rewards_tb.py verilog_train_tb.py src/open-r1/
- 創建一個用於Verilog配方的目錄:
mkdir verilog_recipe
mv verilog_grpo_tb.yaml verilog_recipe/
- 運行訓練:
NCCL_DEBUG=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL CUDA_VISIBLE_DEVICES=0,1,2 ACCELERATE_USE_NCCL=1 accelerate launch --config_file recipes/accelerate_configs/zero3.yaml --num_processes=3 src/open_r1/verilog_train_rtlcoder.py --config verilog_recipe/verilog_grpo_tb.yaml --use_vllm=false
📄 許可證
文檔中未提及許可證相關信息。
📚 詳細文檔
更新日誌
2025.05.17:VeriReason - codeLlama - 7b - RTLCoder - Verilog - GRPO - reasoning - tb首次發佈。
引用信息
如果你使用了我們的模型或數據集,請引用我們的論文:
@misc{wang2025verireason,
title={VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation},
author={Yiting Wang and Guoheng Sun and Wanghao Ye and Gang Qu and Ang Li},
year={2025},
eprint={2505.11849},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2505.11849},
}
致謝
本倉庫受益於OpenR1和LLamaFactory。