VeriReason: An Open-Source Verilog RTL Code Generation Model - Enhancing Hardware Design Performance with Reinforcement Learning

Verireason Codellama 7b RTLCoder Verilog GRPO Reasoning Tb

Developed by Nellyw888

VeriReason is a Verilog RTL code generation method that combines reinforcement learning with testbench feedback, significantly improving the performance of pre-trained models in the field of hardware design.

Large Language Model

Transformers

#Verilog code generation #Reinforcement learning optimization #Testbench feedback

Downloads 1,483

Release Time : 5/13/2025

Model Overview

This model focuses on Verilog RTL code generation. By combining supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, it achieves high-quality hardware design automation.

Model Features

Reinforcement learning optimization

Adopt the GRPO (Guided Reward Proximal Optimization) reinforcement learning method and combine testbench feedback to optimize model performance

High functional correctness rate

Achieve 83.1% functional correctness in the VerilogEval Machine benchmark test, outperforming similar models

Explicit reasoning ability

Combine explicit reasoning ability with reinforcement learning to significantly improve the functional correctness of the first attempt

Strong generalization ability

Show strong generalization ability for unseen designs

Model Capabilities

Verilog code generation

Hardware design automation

RTL synthesis

Design verification

Use Cases

Hardware design

8-bit comparator design

Automatically generate Verilog code for an 8-bit comparator according to specification requirements

Generate a functionally correct Verilog implementation

Complex circuit design

Automatically generate RTL code for complex digital circuits

Improve design efficiency and correctness rate

🚀 VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb

This project, VeriReason, presents a novel approach to enhance pre - trained models for Verilog RTL code generation. It uses reinforcement learning with testbench feedback, achieving high functional correctness and outperforming comparable models.

🚀 Quick Start

For implementation details, visit our GitHub repository: VeriReason and our page.

Check out our paper: VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning - Enhanced Verilog Generation

✨ Features

Novel Approach: Utilizes reinforcement learning with testbench feedback to enhance pre - trained models for Verilog RTL code generation.
Combined Techniques: Combines supervised fine - tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning.
High Performance: Achieves 83.1% functional correctness on the VerilogEval Machine benchmark, outperforming comparable models and large commercial systems.
Explicit Reasoning: Integrates explicit reasoning capabilities with reinforcement learning for Verilog generation.

📦 Installation

To install this project, follow these steps:

Clone the repository: git clone https://github.com/NellyW8/VeriReason.git
Navigate to the project directory: cd VeriReason
Install the dependencies as specified in the repository

💻 Usage Examples

Basic Usage

You can use the model with the transformers library:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Nellyw888/VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model.eval()

prompt = """
Please act as a professional verilog designer. Develop a module that implements a 8-bit comparator. The module should have two 8-bit inputs and one output. If the first input is greater than the second input, the output should be high. Otherwise, the output should be low. First, think through the design approach, considering the functionality, inputs, outputs, and implementation details. Then provide the complete Verilog code implementation. Respond in the following format: <think>
...
</think>
<answer>
```verilog
...```
</answer>
"""

input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=1024, temperature=0.2, top_p=0.95)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

📚 Documentation

Update Log

2025.05.17: Initial release of VeriReason - codeLlama - 7b - RTLCoder - Verilog - GRPO - reasoning - tb

Project Description

This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre - trained models for Verilog RTL code generation. VeriReason combines supervised fine - tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation. Using our curated high - quality training examples alongside a feedback - driven reward model, VeriReason achieves 83.1% functional correctness on the VerilogEval Machine benchmark, substantially outperforming both comparable - sized models and much larger commercial systems like GPT - 4 Turbo.

The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state - of - the - art for automated RTL synthesis. Our 7B parameter model based on Code Llama demonstrates up to a 2.8× increase in first - attempt functional correctness compared to baseline methods and exhibits robust generalization to unseen designs.

Training

The GRPO (Generative Reinforcement Learning from Preference Optimization) training is based on the OpenR1 framework. For training with GRPO:

Move the necessary files to the OpenR1 directory:

mv verilog_rewards_tb.py verilog_train_tb.py src/open - r1/

Create a directory for the Verilog recipe:

mkdir verilog_recipe
mv verilog_grpo_tb.yaml verilog_recipe/

Run training:

NCCL_DEBUG=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL CUDA_VISIBLE_DEVICES=0,1,2 ACCELERATE_USE_NCCL=1 accelerate launch --config_file recipes/accelerate_configs/zero3.yaml --num_processes=3 src/open_r1/verilog_train_rtlcoder.py --config verilog_recipe/verilog_grpo_tb.yaml --use_vllm=false

📄 License

No license information provided in the original document.

📖 Citation

Please cite our paper if you use our model or dataset:

@misc{wang2025verireason,
      title={VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation}, 
      author={Yiting Wang and Guoheng Sun and Wanghao Ye and Gang Qu and Ang Li},
      year={2025},
      eprint={2505.11849},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.11849}, 
}

🙏 Acknowledgement

This repo benefits from OpenR1 and LLamaFactory.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご