Mar Open-Source Image Generation Model - Vector Quantization-Free, Generating High-Quality Images in Continuous Space

Mar

Developed by jadechoghari

An innovative autoregressive image generation approach that achieves high-quality image synthesis in continuous value spaces by eliminating the need for vector quantization

Image Generation Open Source License:MIT #Continuous Value Space Generation #Diffusion Autoregressive #High-Resolution Image Synthesis

Downloads 1,027

Release Time : 9/7/2024

Model Overview

This model proposes a vector quantization-free autoregressive image generation method, modeling the probability distribution of each token through a diffusion process, enabling efficient image generation while maintaining the speed advantage of autoregressive sequence modeling

Model Features

Vector Quantization-Free

Operates in continuous value spaces, eliminating traditional methods' reliance on discrete tokens

Efficient Generation

Combines the speed advantage of autoregressive sequence modeling with the generation quality of diffusion models

Multi-Scale Options

Offers three pre-trained model sizes: base/large/huge

Model Capabilities

Unconditional Image Generation

High-Quality Image Synthesis

Continuous Value Space Modeling

Use Cases

Creative Image Generation

Art Creation

Generate original images with artistic styles

Can produce diverse high-quality images

Design Assistance

Provide designers with creative inspiration and materials

🚀 Autoregressive Image Generation without Vector Quantization

This model (MAR) presents a novel method for autoregressive image generation, eliminating the need for vector quantization. It operates in a continuous - valued space, enabling efficient and high - quality image generation.

🚀 Quick Start

You can easily load it through the Hugging Face DiffusionPipeline and optionally customize various parameters such as the model type, number of steps, and class labels.

✨ Features

This model (MAR) introduces a novel approach to autoregressive image generation by eliminating the need for vector quantization. Instead of relying on discrete tokens, the model operates in a continuous - valued space using a diffusion process to model the per - token probability distribution. By employing a Diffusion Loss function, the model achieves efficient and high - quality image generation while benefiting from the speed advantages of autoregressive sequence modeling. This approach simplifies the generation process, making it applicable to broader continuous - valued domains beyond just image synthesis. It is based on this paper

💻 Usage Examples

Basic Usage

from diffusers import DiffusionPipeline

# load the pretrained model
pipeline = DiffusionPipeline.from_pretrained("jadechoghari/mar", trust_remote_code=True, custom_pipeline="jadechoghari/mar")

# generate an image with the model
generated_image = pipeline(
    model_type="mar_huge",  # choose from 'mar_base', 'mar_large', or 'mar_huge'
    seed=42,                # set a seed for reproducibility
    num_ar_steps=64,        # number of autoregressive steps
    class_labels=[207, 360, 388],  # provide valid ImageNet class labels
    cfg_scale=4,            # classifier-free guidance scale
    output_dir="./images",   # directory to save generated images
    cfg_schedule = "constant", # choose between 'constant' (suggested) and 'linear'
)

# display the generated image
generated_image.show()

This code loads the model, configures it for image generation, and saves the output to a specified directory.

Advanced Usage

We offer three pre - trained MAR models in safetensors format:

mar - base.safetensors
mar - large.safetensors
mar - huge.safetensors

📚 Documentation

This is a Hugging Face Diffusers/GPU implementation of the paper Autoregressive Image Generation without Vector Quantization

The Official PyTorch Implementation is released in this repository

@article{li2024autoregressive,
  title={Autoregressive Image Generation without Vector Quantization},
  author={Li, Tianhong and Tian, Yonglong and Li, He and Deng, Mingyang and He, Kaiming},
  journal={arXiv preprint arXiv:2406.11838},
  year={2024}
}

📄 License

This project is under the MIT license.

Acknowledgements

We thank Congyue Deng and Xinlei Chen for helpful discussion. We thank Google TPU Research Cloud (TRC) for granting us access to TPUs, and Google Cloud Platform for supporting GPU resources.

A large portion of codes in this repo is based on MAE, MAGE and DiT.

Contact

If you have any questions, feel free to contact me through email (tianhong@mit.edu). Enjoy!

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご