svdq-int4-flux.1-schnell Open Source Model - Freely Achieve Efficient Text-to-Image Generation

Svdq Int4 Flux.1 Schnell

Developed by mit-han-lab

INT4 quantized version of FLUX.1-schnell, enabling efficient text-to-image generation with SVDQuant technology

Text-to-Image EnglishOpen Source License:Apache-2.0 #4-bit quantized diffusion #Efficient image generation #Low VRAM requirement

Downloads 20.14k

Release Time : 11/25/2024

Model Overview

This model is a 4-bit quantized version based on FLUX.1-schnell, optimized with SVDQuant technology to significantly improve inference speed and reduce memory usage while maintaining visual quality, suitable for text-to-image generation tasks.

Model Features

Efficient quantization technology

Utilizes SVDQuant technology to achieve 4-bit weight and activation quantization, significantly reducing memory usage and improving inference speed.

Optimized inference engine

Enhances computational efficiency through kernel fusion optimization in the Nunchaku engine, reducing data movement overhead.

High visual fidelity

Maintains high-quality image generation under 4-bit quantization, outperforming other W4A4 and even W4A8 baselines.

Model Capabilities

Text-to-image generation

Efficient inference

Low memory footprint

Use Cases

Creative design

Rapid concept visualization

Quickly generates high-quality images from text descriptions for creative design and concept validation.

Produces clear images at 1024x1024 resolution with just 4 inference steps.

Education and research

Quantization technology research

Serves as a prime example of efficient quantization technology for computer vision and machine learning research.

Achieves 3.6x memory compression and 8.7x inference speed improvement compared to BF16 models.

🚀 SVDQuant: 4-bit Quantization for Text-to-Image Models

SVDQuant is a post - training quantization technique designed for 4 - bit weights and activations, which effectively maintains visual fidelity. It offers significant memory reduction and speedup in text - to - image generation tasks.

🚀 Quick Start

To start using the svdq - int4 - flux.1 - schnell model, you need to set up the environment as described in [mit - han - lab/nunchaku](https://github.com/mit - han - lab/nunchaku).

Diffusers

import torch
from diffusers import FluxPipeline

from nunchaku.models.transformer_flux import NunchakuFluxTransformer2dModel

transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit - han - lab/svdq - int4 - flux.1 - schnell")
pipeline = FluxPipeline.from_pretrained(
    "black - forest - labs/FLUX.1 - schnell", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")
image = pipeline(
    "A cat holding a sign that says hello world", width=1024, height=1024, num_inference_steps=4, guidance_scale=0
).images[0]
image.save("flux.1 - schnell - int4.png")

Comfy UI

![comfyui](https://github.com/mit - han - lab/nunchaku/blob/main/assets/comfyui.jpg?raw=true) For usage details, please refer to comfyui/README.md.

✨ Features

Memory Reduction: On 12B FLUX.1 - dev, SVDQuant achieves 3.6× memory reduction compared to the BF16 model.
Speedup: By eliminating CPU offloading, it offers 8.7× speedup over the 16 - bit model on a 16GB laptop 4090 GPU, 3× faster than the NF4 W4A16 baseline.
Visual Quality: On PixArt - ∑, it demonstrates significantly superior visual quality over other W4A4 or even W4A8 baselines.

📦 Installation

Please follow the instructions in [mit - han - lab/nunchaku](https://github.com/mit - han - lab/nunchaku) to set up the environment.

📚 Documentation

Method

Quantization Method -- SVDQuant

![intuition](https://github.com/mit - han - lab/nunchaku/raw/refs/heads/main/assets/intuition.gif) Overview of SVDQuant. Stage1: Originally, both the activation X and weights W contain outliers, making 4 - bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation and weight. While the activation becomes easier to quantize, the weight now becomes more difficult. Stage 3: SVDQuant further decomposes the weight into a low - rank component and a residual with SVD. Thus, the quantization difficulty is alleviated by the low - rank branch, which runs at 16 - bit precision.

Nunchaku Engine Design

![engine](https://github.com/mit - han - lab/nunchaku/raw/refs/heads/main/assets/engine.jpg) (a) Naïvely running low - rank branch with rank 32 will introduce 57% latency overhead due to extra read of 16 - bit inputs in Down Projection and extra write of 16 - bit outputs in Up Projection. Nunchaku optimizes this overhead with kernel fusion. (b) Down Projection and Quantize kernels use the same input, while Up Projection and 4 - Bit Compute kernels share the same output. To reduce data movement overhead, we fuse the first two and the latter two kernels together.

Model Description

Property	Details
Developed by	MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU and Pika Labs
Model Type	INT W4A4 model
Model Size	6.64GB
Model Resolution	The number of pixels needs to be a multiple of 65,536.
License	Apache - 2.0

🔧 Technical Details

The SVDQuant quantization method and Nunchaku engine design are the key technical components. SVDQuant addresses the challenge of 4 - bit quantization by migrating outliers and decomposing weights. The Nunchaku engine optimizes the low - rank branch to reduce latency overhead.

📄 License

This project is licensed under the Apache - 2.0 license.

Citation

If you find this model useful or relevant to your research, please cite

@inproceedings{
  li2024svdquant,
  title={SVDQuant: Absorbing Outliers by Low - Rank Components for 4 - Bit Diffusion Models},
  author={Li*, Muyang and Lin*, Yujun and Zhang*, Zhekai and Cai, Tianle and Li, Xiuyu and Guo, Junxian and Xie, Enze and Meng, Chenlin and Zhu, Jun - Yan and Han, Song},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025}
}

⚠️ Important Note

The model is only runnable on NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See this [issue](https://github.com/mit - han - lab/nunchaku/issues/1) for more details.

You may observe some slight differences from the BF16 models in detail.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご