🚀 ReasonGen-R1: Chain-of-Thought Reasoning for Autoregressive Image Generation
ReasonGen-R1 is an autoregressive image generation model that incorporates chain-of-thought reasoning. It serves as the official checkpoint for the paper "ReasonGen-R1: Cot for Autoregressive Image generation models through SFT and RL".
Model Information
Property |
Details |
Base Model |
deepseek-ai/Janus-Pro-7B |
Datasets |
Franklin0/ReasonGen-R1-RL-Geneval-12k, Franklin0/ReasonGen-R1-RL-DPG-5k, Franklin0/ReasonGen-R1-RL-T2I-11k |
Library Name |
transformers |
License |
apache-2.0 |
Pipeline Tag |
text-to-image |
Links
📥 Model Download |
🚀 Quick Start |
🙏 Acknowledgement |
📑 Citation
📄 Arxiv Link
✨ Features
Although chain-of-thought (CoT) reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. ReasonGen-R1 is a two-stage framework. First, it imbues an autoregressive image generator with explicit text-based "thinking" skills via supervised fine-tuning (SFT) on a newly generated reasoning dataset of written rationales. Then, it refines its outputs using Group Relative Policy Optimization (GRPO).
To enable the model to reason through text before generating images, a corpus of model-crafted rationales paired with visual prompts is automatically generated and released. This enables controlled planning of object layouts, styles, and scene compositions. The GRPO algorithm uses reward signals from a pretrained vision–language model to assess overall visual quality, optimizing the policy in each update.
Evaluations on Geneval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models. The generated reasoning dataset and training code will be open-sourced to accelerate further advances in text-based reasoning–driven image generation.
📦 Installation
Huggingface
Environment Installation
You can install the necessary dependencies by running the following command:
cd ~
mkdir project
cd project
conda create -n image_rl python==3.12 -y
conda activate image_rl
pip3 install torch==2.6.0 torchvision --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn --no-build-isolation
git clone https://github.com/Franklin-Zhang0/ReasonGen-R1.git
cd ReasonGen-R1
pip install -r requirements.txt
pip install -e .
pip install -e ./Janus
Evaluation Environment Installation (Optional)
If you want to run the evaluation code, you can install the evaluation environment by running the following commands:
# Geneval
cd ~
mkdir project
cd project
git clone https://github.com/djghosh13/geneval.git
cd geneval
conda deactivate
conda create -n geneval python=3.9 -y
conda activate geneval
pip install torch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1
pip install mmcv-full==1.7.0 -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.13/index.html
pip install mmengine==0.7.3
pip install pandas
pip install numpy==1.23.1
pip install open-clip-torch
pip install clip-benchmark
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection; git checkout 2.x
pip install -v -e .
cd ../
bash ./evaluation/download_models.sh "./models"
# DPG
cd ~
cd project
git clone https://github.com/TencentQQGYLab/ELLA.git
cd ELLA
cp ~/project/ReasonGen-R1/benchmark/requirements-for-dpg_bench.txt .
conda deactivate
conda create -n dpg_test python=3.9 -y
conda activate dpg_test
conda install conda-forge::fairseq -y
pip install -r requirements-for-dpg_bench.txt
Once the eval environment is setup, you can use the following commands to run the evaluation:
bash -i benchmark/geneval.sh
bash -i benchmark/dpg_eval.sh
💻 Usage Examples
Inference
To inference with the ReasonGen-R1 model, you can use the following command:
python ReasonGen-R1/Janus/cot_generate_inference.py
SFT Training
To train the SFT model from Janus-Pro-7B model on the ReasonGen-R1-SFT-200k dataset, you can use the following command:
bash ReasonGen-R1/examples/janus_sft.sh
RL Training
To train the RL model from the ReasonGen-R1-SFT model, you can use the following command:
bash ReasonGen-R1/Janus/janus_rl.py
📄 License
This project is licensed under the apache-2.0 license.
🙏 Acknowledgements
We would like to thank Verl, upon which our repo is built.
📑 Citation
@misc{zhang2025reasongenr1cotautoregressiveimage,
title={ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL},
author={Yu Zhang and Yunqi Li and Yifan Yang and Rui Wang and Yuqing Yang and Dai Qi and Jianmin Bao and Dongdong Chen and Chong Luo and Lili Qiu},
year={2025},
eprint={2505.24875},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.24875},
}