PixelFlow-Class2Image Open-Source Image Generation Model - Freely Achieve Rapid Generation of Various Images

Home

Pixelflow Class2Image

Developed by ShoufaChen

PixelFlow is a flow-based pixel-space generative model focused on image generation tasks.

Image Generation

PyTorch

Open Source License:MIT #Pixel-level image generation #Class-conditional generation #Flow-based model

Downloads 72

Release Time : 4/8/2025

Model Overview

PixelFlow is a flow-based generative model that operates directly in pixel space, supporting class-conditional image generation.

Model Features

Flow-based generative model

Adopts a flow-based architecture for direct image generation in pixel space.

Class-conditional generation

Supports class-conditional image generation.

Pixel-space operation

Operates directly in pixel space without additional feature transformations.

Model Capabilities

Image generation

Class-conditional image generation

Use Cases

Image generation

Class-conditional image generation

Generates corresponding images based on specified classes.

🚀 PixelFlow: Pixel-Space Generative Models with Flow

PixelFlow is a family of image generation models that operate directly in the raw pixel space. It simplifies the image generation process and achieves excellent results in terms of computation cost and image quality.

🚀 Quick Start

Setup

1. Create Environment

conda create -n pixelflow python=3.12
conda activate pixelflow

2. Install Dependencies:

PyTorch 2.6.0 — install it according to your system configuration (CUDA version, etc.).
flash-attention v2.7.4.post1: optional, required only for training.
Other packages: pip3 install -r requirements.txt

Demo

We provide an online Gradio demo for class-to-image generation.

You can also easily deploy both class-to-image and text-to-image demos locally by:

python app.py --checkpoint /path/to/checkpoint --class_cond  # for class-to-image

python app.py --checkpoint /path/to/checkpoint  # for text-to-image

✨ Features

Operates directly in the raw pixel space, eliminating the need for a pre - trained VAE and enabling end - to - end training.
Achieves affordable computation cost in pixel space through efficient cascade flow modeling.
Achieves an FID of 1.98 on 256x256 ImageNet class - conditional image generation benchmark.
Excels in image quality, artistry, and semantic control in text - to - image generation.

📦 Installation

The installation steps mainly involve creating a conda environment and installing dependencies. Please refer to the "Setup" section above.

💻 Usage Examples

Basic Usage

To run the class - to - image demo:

python app.py --checkpoint /path/to/checkpoint --class_cond

Advanced Usage

To run the text - to - image demo:

python app.py --checkpoint /path/to/checkpoint

📚 Documentation

Model Zoo

Model	Task	Params	FID	Checkpoint
PixelFlow	class - to - image	677M	1.98	[🤗](https://huggingface.co/ShoufaChen/PixelFlow - Class2Image)
PixelFlow	text - to - image	882M	N/A	[🤗](https://huggingface.co/ShoufaChen/PixelFlow - Text2Image)

Introduction

We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent - space models. This approach simplifies the image generation process by eliminating the need for a pre - trained Variational Autoencoder (VAE) and enabling the whole model end - to - end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256x256 ImageNet class - conditional image generation benchmark. The qualitative text - to - image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next - generation visual generation models.

Training

1. ImageNet Preparation

Download the ImageNet dataset from [http://www.image - net.org/](http://www.image - net.org/).
Use the extract_ILSVRC.sh to extract and organize the training and validation images into labeled subfolders.

2. Training Command

torchrun --nnodes = 1 --nproc_per_node = 8 train.py configs/pixelflow_xl_c2i.yaml

Evaluation (FID, Inception Score, etc.)

We provide a sample_ddp.py script, adapted from DiT, for generating sample images and saving them both as a folder and as a .npz file. The .npz file is compatible with ADM's TensorFlow evaluation suite, allowing direct computation of FID, Inception Score, and other metrics.

torchrun --nnodes = 1 --nproc_per_node = 8 sample_ddp.py --pretrained /path/to/checkpoint

🔧 Technical Details

PixelFlow operates directly in the raw pixel space, which simplifies the image generation process. It uses efficient cascade flow modeling to achieve affordable computation cost in pixel space. By eliminating the need for a pre - trained VAE, the whole model can be trained end - to - end.

📄 License

The project is licensed under the MIT license.

BibTeX

@article{chen2025pixelflow,
  title={PixelFlow: Pixel - Space Generative Models with Flow},
  author={Chen, Shoufa and Ge, Chongjian and Zhang, Shilong and Sun, Peize and Luo, Ping},
  journal={arXiv preprint arXiv:2504.07963},
  year={2025}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご