ReasonGen - R1 Open-Source Image Generation Model - Enhancing the Logic and Quality of Image Generation through Fusion Inference

Reasongen R1

Developed by Franklin0

ReasonGen-R1 is an autoregressive image generation model that integrates chain-of-thought reasoning. It enhances the logic and quality of image generation through SFT and RL.

Text-to-Image

Transformers

Open Source License:Apache-2.0 #Chain-of-Thought Image Generation #Autoregressive Inference #Reinforcement Learning Optimization

Downloads 142

Release Time : 5/27/2025

Model Overview

ReasonGen-R1 is a two-stage framework. First, it endows the model with explicit 'thinking' ability based on text through supervised fine-tuning (SFT). Then, it uses Group Relative Policy Optimization (GRPO) to optimize its output. This model can perform reasoning through text before generating images, enabling controllable planning of object layout, style, and scene combination.

Model Features

Chain-of-Thought Reasoning

Explicitly plan image generation through text reasoning to enhance logic and controllability

Two-Stage Training Framework

First, conduct supervised fine-tuning (SFT) to learn reasoning ability, and then optimize the generation quality through reinforcement learning (RL)

Group Relative Policy Optimization (GRPO)

Use the reward signals of pre-trained vision-language models to evaluate and optimize the generation quality

Controllable Image Generation

Accurately plan and control object layout, style, and scene combination

Model Capabilities

Text-to-Image Generation

Inference-Based Image Planning

Controllable Image Synthesis

Multi-Style Image Generation

Use Cases

Creative Design

Concept Art Generation

Generate high-quality concept artworks based on detailed text descriptions

Generate concept maps that are logical and rich in details

Advertising Design

Automatically generate advertising images based on product descriptions

Advertising images with consistent styles that meet marketing needs

Education

Teaching Material Generation

Automatically generate illustrations based on course content

Visual materials that accurately express abstract concepts

🚀 ReasonGen-R1: Chain-of-Thought Reasoning for Autoregressive Image Generation

ReasonGen-R1 is an autoregressive image generation model that incorporates chain-of-thought reasoning. It serves as the official checkpoint for the paper "ReasonGen-R1: Cot for Autoregressive Image generation models through SFT and RL".

Model Information

Property	Details
Base Model	deepseek-ai/Janus-Pro-7B
Datasets	Franklin0/ReasonGen-R1-RL-Geneval-12k, Franklin0/ReasonGen-R1-RL-DPG-5k, Franklin0/ReasonGen-R1-RL-T2I-11k
Library Name	transformers
License	apache-2.0
Pipeline Tag	text-to-image

✨ Features

Although chain-of-thought (CoT) reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. ReasonGen-R1 is a two-stage framework. First, it imbues an autoregressive image generator with explicit text-based "thinking" skills via supervised fine-tuning (SFT) on a newly generated reasoning dataset of written rationales. Then, it refines its outputs using Group Relative Policy Optimization (GRPO).

To enable the model to reason through text before generating images, a corpus of model-crafted rationales paired with visual prompts is automatically generated and released. This enables controlled planning of object layouts, styles, and scene compositions. The GRPO algorithm uses reward signals from a pretrained vision–language model to assess overall visual quality, optimizing the policy in each update.

Evaluations on Geneval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models. The generated reasoning dataset and training code will be open-sourced to accelerate further advances in text-based reasoning–driven image generation.

📦 Installation

Huggingface

Model	Download
ReasonGen-R1	🤗 Hugging Face
ReasonGen-R1-SFT-Only	🤗 Hugging Face

Dataset	Download
ReasonGen-R1-Datasets	🤗 Hugging Face

Environment Installation

You can install the necessary dependencies by running the following command:

cd ~
mkdir project
cd project
conda create -n image_rl python==3.12 -y
conda activate image_rl
pip3 install torch==2.6.0 torchvision --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn --no-build-isolation
git clone https://github.com/Franklin-Zhang0/ReasonGen-R1.git
cd ReasonGen-R1
pip install -r requirements.txt
pip install -e .
pip install -e ./Janus

Evaluation Environment Installation (Optional)

If you want to run the evaluation code, you can install the evaluation environment by running the following commands:

# Geneval
cd ~
mkdir project
cd project
git clone https://github.com/djghosh13/geneval.git
cd geneval
conda deactivate
conda create -n geneval python=3.9 -y
conda activate geneval
pip install torch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1
pip install mmcv-full==1.7.0 -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.13/index.html
pip install mmengine==0.7.3

pip install pandas
pip install numpy==1.23.1

pip install open-clip-torch
pip install clip-benchmark

git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection; git checkout 2.x
pip install -v -e .

cd ../
bash ./evaluation/download_models.sh "./models"

# DPG
cd ~
cd project
git clone https://github.com/TencentQQGYLab/ELLA.git
cd ELLA
cp ~/project/ReasonGen-R1/benchmark/requirements-for-dpg_bench.txt .
conda deactivate
conda create -n dpg_test python=3.9 -y
conda activate dpg_test
conda install conda-forge::fairseq -y
pip install -r requirements-for-dpg_bench.txt

Once the eval environment is setup, you can use the following commands to run the evaluation:

bash -i benchmark/geneval.sh
bash -i benchmark/dpg_eval.sh

💻 Usage Examples

Inference

To inference with the ReasonGen-R1 model, you can use the following command:

python ReasonGen-R1/Janus/cot_generate_inference.py

SFT Training

To train the SFT model from Janus-Pro-7B model on the ReasonGen-R1-SFT-200k dataset, you can use the following command:

bash ReasonGen-R1/examples/janus_sft.sh

RL Training

To train the RL model from the ReasonGen-R1-SFT model, you can use the following command:

bash ReasonGen-R1/Janus/janus_rl.py

📄 License

This project is licensed under the apache-2.0 license.

🙏 Acknowledgements

We would like to thank Verl, upon which our repo is built.

📑 Citation

@misc{zhang2025reasongenr1cotautoregressiveimage,
      title={ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL}, 
      author={Yu Zhang and Yunqi Li and Yifan Yang and Rui Wang and Yuqing Yang and Dai Qi and Jianmin Bao and Dongdong Chen and Chong Luo and Lili Qiu},
      year={2025},
      eprint={2505.24875},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.24875}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご