Open-source svdq-int4-flux.1-depth-dev model - Draw images according to text descriptions and follow the image structure, saving memory and improving speed

Svdq Int4 Flux.1 Depth Dev

Developed by mit-han-lab

INT4 quantized version of FLUX.1-Depth-dev, capable of generating images from text descriptions while adhering to the structure of the input image. Compared to the original BF16 model, this version saves approximately 4x memory and improves runtime speed by 2-3x.

Image Generation EnglishOpen Source License:Other #INT4 Quantization #Depth-to-Image Conversion #ControlNet Control

Downloads 9,085

Release Time : 2/4/2025

Model Overview

This model is an INT4 quantized version based on FLUX.1-Depth-dev, primarily used for image generation tasks. It can generate images from text descriptions while preserving the structure of the input image.

Model Features

Efficient Quantization

Utilizes SVDQuant method for INT4 quantization, significantly reducing memory usage and improving runtime speed.

Structure Preservation

Generates images from text descriptions while preserving the structure of the input image.

High-Performance Inference

Optimized via the Nunchaku engine to reduce data movement overhead and enhance inference efficiency.

Model Capabilities

Text-to-Image Generation

Depth-to-Image Conversion

Image Structure Preservation

Efficient Quantized Inference

Use Cases

Creative Design

Concept Art Generation

Generates concept art from text descriptions while preserving the structure of the input image.

Produces high-quality concept art with rich details and accurate structure.

Image Editing

Image Style Transfer

Transforms input images into different styles while maintaining the original structure.

Style-transferred images retain the original structure with diverse and natural styles.

🚀 svdq-int4-flux.1-depth-dev

svdq-int4-flux.1-depth-dev is an INT4-quantized version of the FLUX.1-Depth-dev model. It can generate images based on text descriptions while following the structure of the input image. This quantized model offers about 4× memory savings and runs 2 - 3× faster than the original BF16 model.

✨ Features

Quantization: Utilizes INT4 quantization to reduce memory usage and accelerate inference.
Image Generation: Capable of generating images from text descriptions while adhering to the input image's structure.
Low - Rank Decomposition: SVDQuant method decomposes weights into low - rank components and residuals for easier quantization.
Optimized Engine: Nunchaku engine optimizes the low - rank branch to reduce latency overhead.

📦 Installation

Diffusers

Please follow the instructions in [mit - han - lab/nunchaku](https://github.com/mit - han - lab/nunchaku) to set up the environment. Also, install some ControlNet dependencies:

pip install git+https://github.com/asomoza/image_gen_aux.git
pip install controlnet_aux mediapipe

💻 Usage Examples

Basic Usage

import torch
from diffusers import FluxControlPipeline
from diffusers.utils import load_image
from image_gen_aux import DepthPreprocessor

from nunchaku.models.transformer_flux import NunchakuFluxTransformer2dModel

transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-depth-dev")

pipe = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Depth-dev",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
).to("cuda")

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(control_image)[0].convert("RGB")

image = pipe(
    prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=30, guidance_scale=10.0
).images[0]
image.save("flux.1-depth-dev.png")

Comfy UI

Work in progress. Stay tuned!

📚 Documentation

Method

Quantization Method -- SVDQuant

intuition Overview of SVDQuant. Stage1: Originally, both the activation X and weights W contain outliers, making 4 - bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation and weight. While the activation becomes easier to quantize, the weight now becomes more difficult. Stage 3: SVDQuant further decomposes the weight into a low - rank component and a residual with SVD. Thus, the quantization difficulty is alleviated by the low - rank branch, which runs at 16 - bit precision.

Nunchaku Engine Design

engine (a) Naïvely running low - rank branch with rank 32 will introduce 57% latency overhead due to extra read of 16 - bit inputs in Down Projection and extra write of 16 - bit outputs in Up Projection. Nunchaku optimizes this overhead with kernel fusion. (b) Down Projection and Quantize kernels use the same input, while Up Projection and 4 - Bit Compute kernels share the same output. To reduce data movement overhead, we fuse the first two and the latter two kernels together.

Model Description

Property	Details
Developed by	MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU and Pika Labs
Model Type	INT W4A4 model
Model Size	6.64GB
Model Resolution	The number of pixels need to be a multiple of 65,536.
License	Apache - 2.0

🔧 Technical Details

The model is based on the FLUX.1 - Depth - dev model and is quantized to INT4.
It uses the SVDQuant method for quantization and the Nunchaku engine for inference optimization.

📄 License

The model is licensed under the Apache - 2.0 license.

Citation

If you find this model useful or relevant to your research, please cite

@inproceedings{
  li2024svdquant,
  title={SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models},
  author={Li*, Muyang and Lin*, Yujun and Zhang*, Zhekai and Cai, Tianle and Li, Xiuyu and Guo, Junxian and Xie, Enze and Meng, Chenlin and Zhu, Jun-Yan and Han, Song},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025}
}

Quantization Library: DeepCompressor Inference Engine: Nunchaku

[Paper] [Code] [Demo] [Website] [Blog]

teaser

⚠️ Important Note

The model is only runnable on NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See this issue for more details.

You may observe some slight differences from the BF16 models in detail.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご