đ DFloat11 Compressed Model: black-forest-labs/FLUX.1-dev
This project presents a losslessly compressed version of the black-forest-labs/FLUX.1-dev
model. By leveraging the custom DFloat11 format, it significantly reduces GPU memory consumption while maintaining the same output accuracy as the original model.
Model Information
Property |
Details |
Base Model |
black-forest-labs/FLUX.1-dev |
Base Model Relation |
Quantized |
Pipeline Tag |
Text-to-image |
Tags |
dfloat11, df11, lossless compression, 70% size, 100% accuracy |
đ Quick Start
This is a losslessly compressed version of black-forest-labs/FLUX.1-dev
using our custom DFloat11 format. The outputs of this compressed model are bit-for-bit identical to the original BFloat16 model, while reducing GPU memory consumption by approximately 30%.
⨠Features
- No CPU decompression or host - device data transfer: All operations are handled entirely on the GPU.
- High - speed: DFloat11 is much faster than CPU - offloading approaches, enabling practical deployment in memory - constrained environments.
- Lossless compression: The compression is fully lossless, guaranteeing that the model's outputs are bit - for - bit identical to those of the original model.
đĻ Installation
-
Install the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA - compatible GPU and PyTorch installed):
pip install dfloat11[cuda12]
đģ Usage Examples
Basic Usage
import torch
from diffusers import FluxPipeline
from dfloat11 import DFloat11Model
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()
DFloat11Model.from_pretrained('DFloat11/FLUX.1-dev-DF11', device='cpu', bfloat16_model=pipe.transformer)
prompt = "A futuristic cityscape at sunset, with flying cars, neon lights, and reflective water canals"
image = pipe(
prompt,
width=1920,
height=1440,
guidance_scale=3.5,
num_inference_steps=50,
max_sequence_length=512,
generator=torch.Generator(device="cuda").manual_seed(0)
).images[0]
image.save("image.png")
đ Documentation
How It Works
DFloat11 compresses model weights using Huffman coding of BFloat16 exponent bits, combined with hardware - aware algorithmic designs that enable efficient on - the - fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are decompressed just before matrix multiplications, then immediately discarded after use to minimize memory footprint.
How to Use
- Install the DFloat11 pip package as described in the installation section.
- Use the provided Python code example to run the DFloat11 model.
đ License
No license information is provided in the original document, so this section is skipped.
Learn More