ReasonGen-R1-SFT Open Source Text-to-Image Model - Free to Use, Generate Images Based on Text Thinking

Reasongen R1 SFT

Developed by Franklin0

ReasonGen-R1 is a text-to-image model trained on image prompts and reasoning basis datasets through supervised fine-tuning (SFT), with the explicit 'thinking' ability based on text.

Text-to-Image

Transformers

Open Source License:Apache-2.0 #Chain-of-thought image generation #Autoregressive reasoning optimization #Controllable scenario planning

Downloads 312

Release Time : 5/27/2025

Model Overview

ReasonGen-R1 is a two-stage framework. First, the autoregressive image generator is equipped with text-based reasoning ability through supervised fine-tuning, and then the output is optimized using Group Relative Policy Optimization (GRPO).

Model Features

Chain-of-thought reasoning ability

Through supervised fine-tuning, the model is equipped with the explicit 'thinking' ability based on text and can perform controllable planning of object layout, style, and scene combination.

Two-stage optimization framework

First, perform supervised fine-tuning (SFT), and then use Group Relative Policy Optimization (GRPO) to optimize the output.

Automatically generated reasoning basis corpus

Publish a model-generated reasoning basis corpus paired with visual prompts to support controllable image generation planning.

Model Capabilities

Text-to-image generation

Text-based reasoning

Controllable image planning

Use Cases

Creative design

Scene design

Generate complex scene layouts according to text descriptions

Generate detailed scene images that meet the text reasoning basis

Stylized image generation

Generate images in a specific artistic style based on style descriptions

Generate artworks with consistent style and meeting expectations

Education

Visual teaching material generation

Generate supporting visual materials according to teaching needs

Generate images highly relevant to the teaching content

🚀 ReasonGen-R1 (SFT Only) Model Card

ReasonGen-R1 (SFT Only) is a text-to-image model that solves the problem of integrating chain-of-thought reasoning and reinforcement learning into generative vision models. It offers high - quality image generation based on text prompts and reasoning.

🚀 Quick Start

This section provides an overview of the ReasonGen-R1 model. For more detailed information, please refer to the subsequent sections.

✨ Features

Innovative Framework: ReasonGen-R1 is a two - stage framework. First, it endows an autoregressive image generator with explicit text - based "thinking" skills through supervised fine - tuning (SFT) on a newly generated reasoning dataset. Then, it refines the outputs using Group Relative Policy Optimization (GRPO).
Reasoning - Driven Image Generation: By automatically generating and releasing a corpus of model - crafted rationales paired with visual prompts, the model can reason through text before generating images, enabling controlled planning of object layouts, styles, and scene compositions.
High - Performance: Evaluations on Geneval, DPG, and the T2I benchmark show that ReasonGen-R1 consistently outperforms strong baselines and prior state - of - the - art models.

📦 Installation

The library_name for this model is transformers. You may need to install relevant dependencies according to the official documentation of transformers to use this model.

📚 Documentation

Model Information

Property	Details
Base Model	deepseek-ai/Janus-Pro-7B
Datasets	Franklin0/ReasonGen-R1-SFT-230k
Library Name	transformers
License	apache-2.0
Pipeline Tag	text-to-image

Introduction

Although chain - of - thought (CoT) reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. We introduce ReasonGen-R1, a two - stage framework that first imbues an autoregressive image generator with explicit text - based "thinking" skills via supervised fine - tuning (SFT) on a newly generated reasoning dataset of written rationales, and then refines its outputs using Group Relative Policy Optimization (GRPO).

To enable the model to reason through text before generating images, We automatically generate and release a corpus of model - crafted rationales paired with visual prompts, enabling controlled planning of object layouts, styles, and scene compositions.

Our GRPO algorithm uses reward signals from a pretrained vision–language model to assess overall visual quality, optimizing the policy in each update.

Evaluations on Geneval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state - of - the - art models. We will open - source our generated reasoning dataset and training code to accelerate further advances in text - based reasoning–driven image generation.

Acknowledgements

We would like to thank Verl, upon which our repo is built.

📄 License

This model is released under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご