đ Autoregressive Image Generation without Vector Quantization
This model (MAR) presents a novel method for autoregressive image generation, eliminating the need for vector quantization. It operates in a continuous - valued space, enabling efficient and high - quality image generation.
đ Quick Start
You can easily load it through the Hugging Face DiffusionPipeline
and optionally customize various parameters such as the model type, number of steps, and class labels.
⨠Features
This model (MAR) introduces a novel approach to autoregressive image generation by eliminating the need for vector quantization. Instead of relying on discrete tokens, the model operates in a continuous - valued space using a diffusion process to model the per - token probability distribution. By employing a Diffusion Loss function, the model achieves efficient and high - quality image generation while benefiting from the speed advantages of autoregressive sequence modeling. This approach simplifies the generation process, making it applicable to broader continuous - valued domains beyond just image synthesis. It is based on this paper
đģ Usage Examples
Basic Usage
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("jadechoghari/mar", trust_remote_code=True, custom_pipeline="jadechoghari/mar")
generated_image = pipeline(
model_type="mar_huge",
seed=42,
num_ar_steps=64,
class_labels=[207, 360, 388],
cfg_scale=4,
output_dir="./images",
cfg_schedule = "constant",
)
generated_image.show()
This code loads the model, configures it for image generation, and saves the output to a specified directory.
Advanced Usage
We offer three pre - trained MAR models in safetensors
format:
mar - base.safetensors
mar - large.safetensors
mar - huge.safetensors
đ Documentation
This is a Hugging Face Diffusers/GPU implementation of the paper Autoregressive Image Generation without Vector Quantization
The Official PyTorch Implementation is released in this repository
@article{li2024autoregressive,
title={Autoregressive Image Generation without Vector Quantization},
author={Li, Tianhong and Tian, Yonglong and Li, He and Deng, Mingyang and He, Kaiming},
journal={arXiv preprint arXiv:2406.11838},
year={2024}
}
đ License
This project is under the MIT license.
Acknowledgements
We thank Congyue Deng and Xinlei Chen for helpful discussion. We thank Google TPU Research Cloud (TRC) for granting us access to TPUs, and Google Cloud Platform for supporting GPU resources.
A large portion of codes in this repo is based on MAE, MAGE and DiT.
Contact
If you have any questions, feel free to contact me through email (tianhong@mit.edu). Enjoy!