FLUX.1-Canny-dev-DF11 Open-source Model - Lossless compression reduces video memory usage by 30%, and the output is consistent with the original version.

FLUX.1 Canny Dev DF11

Developed by DFloat11

A version of black-forest-labs/FLUX.1-Canny-dev that uses the DFloat11 format for lossless compression, which can reduce GPU memory consumption by approximately 30% while maintaining bitwise identical outputs to the original model.

Text-to-Image #Lossless compression #Efficient GPU inference #Text-to-image generation

Downloads 424

Release Time : 5/8/2025

Model Overview

This model is a text-to-image generation model based on DFloat11 compression technology, mainly used for controlling image generation through Canny edge detection.

Model Features

Lossless compression

Compressed using the DFloat11 format, with bitwise identical outputs to the original BFloat16 model

Memory optimization

Reduces GPU memory consumption by approximately 30%

Efficient inference

All operations are completed on the GPU, eliminating the need for CPU decompression or host-device data transfer

Immediate decompression

Weights are decompressed before matrix multiplication and discarded immediately after use to minimize memory usage

Model Capabilities

Text-to-image generation

Edge detection-based image control generation

High-resolution image generation (1024x1024)

Use Cases

Creative design

Concept art generation

Generate creative concept art images based on text prompts

Generate high-quality art images that match the text description

Product design

Product prototype visualization

Generate product renderings based on edge sketches

Quickly transform sketches into high-quality renderings

🚀 DFloat11 Compressed Model: `black-forest-labs/FLUX.1-Canny-dev`

This project presents a losslessly compressed version of the black-forest-labs/FLUX.1-Canny-dev model. By leveraging the custom DFloat11 format, it significantly reduces GPU memory usage while maintaining the same output as the original model.

✨ Features

Lossless Compression: The outputs of the compressed model are bit - for - bit identical to the original BFloat16 model.
Reduced Memory Consumption: It cuts down GPU memory consumption by approximately 30%.
Efficient Decompression: Utilizes Huffman coding and hardware - aware algorithmic designs for on - the - fly decompression directly on the GPU.
GPU - Only Operations: All operations are handled entirely on the GPU, eliminating CPU decompression and host - device data transfer.

📦 Installation

Install or upgrade the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA - compatible GPU and PyTorch installed):
```
pip install -U dfloat11[cuda12]
# or if you have CUDA version 11:
# pip install -U dfloat11[cuda11]
```
Install or upgrade the diffusers and controlnet_aux packages.
```
pip install -U diffusers controlnet_aux
```

💻 Usage Examples

Basic Usage

To use the DFloat11 model, run the following example code in Python:

import torch
from controlnet_aux import CannyDetector
from diffusers import FluxControlPipeline
from diffusers.utils import load_image
from dfloat11 import DFloat11Model

pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

DFloat11Model.from_pretrained('DFloat11/FLUX.1-Canny-dev-DF11', device='cpu', bfloat16_model=pipe.transformer)

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = CannyDetector()
control_image = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024)

image = pipe(
    prompt=prompt,
    control_image=control_image,
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=30.0,
).images[0]
image.save("output.png")

📚 Documentation

How It Works

DFloat11 compresses model weights using Huffman coding of BFloat16 exponent bits, combined with hardware - aware algorithmic designs that enable efficient on - the - fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are decompressed just before matrix multiplications, then immediately discarded after use to minimize memory footprint.

Key benefits:

No CPU decompression or host - device data transfer: all operations are handled entirely on the GPU.
DFloat11 is much faster than CPU - offloading approaches, enabling practical deployment in memory - constrained environments.
The compression is fully lossless, guaranteeing that the model's outputs are bit - for - bit identical to those of the original model.

Learn More

Paper: 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic - Length Float
GitHub: https://github.com/LeanModels/DFloat11
HuggingFace: https://huggingface.co/DFloat11

📄 Information Table

Property	Details
Base Model	black - forest - labs/FLUX.1 - Canny - dev
Base Model Relation	quantized
Pipeline Tag	text - to - image
Tags	dfloat11, df11, lossless compression, 70% size, 100% accuracy