The open-source model svdq-int4-flux.1-canny-dev - Generate images based on text descriptions and follow the Canny edges

Svdq Int4 Flux.1 Canny Dev

Developed by mit-han-lab

INT4 quantized version of FLUX.1-Canny-dev, capable of generating images based on text descriptions while adhering to the Canny edges of the input image.

Image Generation EnglishOpen Source License:Other #INT4 Quantized Diffusion Model #Canny Edge Control Generation #High Memory Efficiency

Downloads 18.30k

Release Time : 2/4/2025

Model Overview

This model is the INT4 quantized version of FLUX.1-Canny-dev, primarily used for image generation tasks, especially text-to-image and Canny edge-controlled image generation.

Model Features

INT4 Quantization

Uses SVDQuant method for INT4 quantization, providing approximately 4x memory savings.

Efficient Inference

Runs 2-3 times faster than the original BF16 model.

Canny Edge Control

Generates images based on the Canny edges of the input image, preserving edge structures.

Nunchaku Engine Optimization

Optimizes computation for low-rank branches through kernel fusion, reducing data movement overhead.

Model Capabilities

Text-to-Image Generation

Image-to-Image Generation

Canny Edge-Controlled Image Generation

Use Cases

Creative Design

Concept Art Generation

Generates concept art images based on text descriptions while maintaining specific edge structures.

Produces artistic-style images with clear edges.

Education

Educational Material Generation

Quickly generates images tailored to specific teaching needs.

Produces high-quality images for educational purposes.

🚀 svdq-int4-flux.1-canny-dev

svdq-int4-flux.1-canny-dev is an INT4-quantized image-to-image model. It can generate images based on text descriptions while following the Canny edge of a given input image. This model offers approximately 4× memory savings and runs 2–3× faster than the original BF16 model.

Quantization Library: DeepCompressor Inference Engine: Nunchaku

[Paper] [Code] [Demo] [Website] [Blog]

teaser

✨ Features

Quantization Advantage: This is an INT4-quantized version of FLUX.1-Canny-dev, offering significant memory savings and faster running speed.
Image Generation: It can generate images based on text descriptions while following the Canny edge of the input image.

📦 Installation

Please follow the instructions in mit-han-lab/nunchaku to set up the environment. Also, install some ControlNet dependencies:

pip install git+https://github.com/asomoza/image_gen_aux.git
pip install controlnet_aux mediapipe

💻 Usage Examples

Basic Usage

import torch
from controlnet_aux import CannyDetector
from diffusers import FluxControlPipeline
from diffusers.utils import load_image

from nunchaku.models.transformer_flux import NunchakuFluxTransformer2dModel

transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-canny-dev")
pipe = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Canny-dev", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = CannyDetector()
control_image = processor(
    control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024
)

image = pipe(
    prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=50, guidance_scale=30.0
).images[0]
image.save("flux.1-canny-dev.png")

Comfy UI

Work in progress. Stay tuned!

🔧 Technical Details

Quantization Method -- SVDQuant

intuition Overview of SVDQuant. Stage1: Originally, both the activation X and weights W contain outliers, making 4-bit quantization challenging. Stage 2: We migrate the outliers from activations to weights, resulting in the updated activation and weight. While the activation becomes easier to quantize, the weight now becomes more difficult. Stage 3: SVDQuant further decomposes the weight into a low-rank component and a residual with SVD. Thus, the quantization difficulty is alleviated by the low-rank branch, which runs at 16-bit precision.

Nunchaku Engine Design

engine (a) Naïvely running low-rank branch with rank 32 will introduce 57% latency overhead due to extra read of 16-bit inputs in Down Projection and extra write of 16-bit outputs in Up Projection. Nunchaku optimizes this overhead with kernel fusion. (b) Down Projection and Quantize kernels use the same input, while Up Projection and 4-Bit Compute kernels share the same output. To reduce data movement overhead, we fuse the first two and the latter two kernels together.

📚 Documentation

Model Description

Property	Details
Developed by	MIT, NVIDIA, CMU, Princeton, UC Berkeley, SJTU and Pika Labs
Model Type	INT W4A4 model
Model Size	6.64GB
Model Resolution	The number of pixels need to be a multiple of 65,536.
License	Apache-2.0

Limitations

The model is only runnable on NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See this issue for more details.
You may observe some slight differences from the BF16 models in detail.

Citation

If you find this model useful or relevant to your research, please cite

@inproceedings{
  li2024svdquant,
  title={SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models},
  author={Li*, Muyang and Lin*, Yujun and Zhang*, Zhekai and Cai, Tianle and Li, Xiuyu and Guo, Junxian and Xie, Enze and Meng, Chenlin and Zhu, Jun-Yan and Han, Song},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025}
}

📄 License

This model is under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご