đ PixelFlow: Pixel-Space Generative Models with Flow
PixelFlow is a family of image generation models that operate directly in the raw pixel space. It simplifies the image generation process and achieves excellent results in terms of computation cost and image quality.
đ Quick Start
Setup
1. Create Environment
conda create -n pixelflow python=3.12
conda activate pixelflow
2. Install Dependencies:
- PyTorch 2.6.0 â install it according to your system configuration (CUDA version, etc.).
- flash-attention v2.7.4.post1: optional, required only for training.
- Other packages:
pip3 install -r requirements.txt
Demo
We provide an online Gradio demo for class-to-image generation.
You can also easily deploy both class-to-image and text-to-image demos locally by:
python app.py --checkpoint /path/to/checkpoint --class_cond
or
python app.py --checkpoint /path/to/checkpoint
⨠Features
- Operates directly in the raw pixel space, eliminating the need for a pre - trained VAE and enabling end - to - end training.
- Achieves affordable computation cost in pixel space through efficient cascade flow modeling.
- Achieves an FID of 1.98 on 256x256 ImageNet class - conditional image generation benchmark.
- Excels in image quality, artistry, and semantic control in text - to - image generation.
đĻ Installation
The installation steps mainly involve creating a conda environment and installing dependencies. Please refer to the "Setup" section above.
đģ Usage Examples
Basic Usage
To run the class - to - image demo:
python app.py --checkpoint /path/to/checkpoint --class_cond
Advanced Usage
To run the text - to - image demo:
python app.py --checkpoint /path/to/checkpoint
đ Documentation
Model Zoo
Model |
Task |
Params |
FID |
Checkpoint |
PixelFlow |
class - to - image |
677M |
1.98 |
[đ¤](https://huggingface.co/ShoufaChen/PixelFlow - Class2Image) |
PixelFlow |
text - to - image |
882M |
N/A |
[đ¤](https://huggingface.co/ShoufaChen/PixelFlow - Text2Image) |
Introduction
We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent - space models. This approach simplifies the image generation process by eliminating the need for a pre - trained Variational Autoencoder (VAE) and enabling the whole model end - to - end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256x256 ImageNet class - conditional image generation benchmark. The qualitative text - to - image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next - generation visual generation models.
Training
1. ImageNet Preparation
- Download the ImageNet dataset from [http://www.image - net.org/](http://www.image - net.org/).
- Use the extract_ILSVRC.sh to extract and organize the training and validation images into labeled subfolders.
2. Training Command
torchrun --nnodes = 1 --nproc_per_node = 8 train.py configs/pixelflow_xl_c2i.yaml
Evaluation (FID, Inception Score, etc.)
We provide a sample_ddp.py script, adapted from DiT, for generating sample images and saving them both as a folder and as a .npz file. The .npz file is compatible with ADM's TensorFlow evaluation suite, allowing direct computation of FID, Inception Score, and other metrics.
torchrun --nnodes = 1 --nproc_per_node = 8 sample_ddp.py --pretrained /path/to/checkpoint
đ§ Technical Details
PixelFlow operates directly in the raw pixel space, which simplifies the image generation process. It uses efficient cascade flow modeling to achieve affordable computation cost in pixel space. By eliminating the need for a pre - trained VAE, the whole model can be trained end - to - end.
đ License
The project is licensed under the MIT license.
BibTeX
@article{chen2025pixelflow,
title={PixelFlow: Pixel - Space Generative Models with Flow},
author={Chen, Shoufa and Ge, Chongjian and Zhang, Shilong and Sun, Peize and Luo, Ping},
journal={arXiv preprint arXiv:2504.07963},
year={2025}
}