FLUX.1 - Fill - dev - DF11 open-source model: Reduce video memory usage by 30% while maintaining high precision!

FLUX.1 Fill Dev DF11

Developed by DFloat11

FLUX.1-Fill-dev model using DFloat11 lossless compression format, reducing VRAM usage by 30% while maintaining bit-level accuracy

Image Generation #Lossless Compression VRAM Optimization #GPU Real-time Decompression #Text-to-Image Acceleration

Downloads 938

Release Time : 5/8/2025

Model Overview

Compressed version based on the original BFloat16 model, achieving lossless compression through custom DFloat11 format, optimized for GPU inference

Model Features

DFloat11 Lossless Compression

Utilizes dynamic-length floating-point encoding technology to achieve 70% volume reduction while maintaining 100% accuracy

GPU Native Decompression

Weights are decompressed in real-time directly on the GPU, avoiding CPU decompression or host-device data transfer overhead

VRAM Optimization

Compressed weights persistently reside in VRAM during inference, reducing VRAM usage by 30% compared to the original model

Model Capabilities

Image Inpainting

Text-prompted Image Filling

High-resolution Image Generation

Use Cases

Image Editing

Object Removal and Replacement

Mark areas to be modified with masks and generate new content based on text prompts

Example shows the effect of replacing a mug with a white paper cup

Creative Design

Concept Visualization

Generate or modify design elements based on textual descriptions

🚀 DFloat11 Compressed Model: `black-forest-labs/FLUX.1-Fill-dev`

This is a losslessly compressed text-to-image model using the custom DFloat11 format, reducing GPU memory consumption by about 30% while maintaining bit - for - bit identical outputs to the original model.

📋 Model Information

Property	Details
Base Model	black-forest-labs/FLUX.1-Fill-dev
Base Model Relation	quantized
Pipeline Tag	text-to-image
Tags	dfloat11, df11, lossless compression, 70% size, 100% accuracy

🚀 Quick Start

This is a losslessly compressed version of black-forest-labs/FLUX.1-Fill-dev using our custom DFloat11 format. The outputs of this compressed model are bit-for-bit identical to the original BFloat16 model, while reducing GPU memory consumption by approximately 30%.

✨ Features

How It Works

DFloat11 compresses model weights using Huffman coding of BFloat16 exponent bits, combined with hardware-aware algorithmic designs that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are decompressed just before matrix multiplications, then immediately discarded after use to minimize memory footprint.

Key benefits:

No CPU decompression or host - device data transfer: all operations are handled entirely on the GPU.
DFloat11 is much faster than CPU - offloading approaches, enabling practical deployment in memory - constrained environments.
The compression is fully lossless, guaranteeing that the model's outputs are bit - for - bit identical to those of the original model.

📦 Installation

Install or upgrade the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA - compatible GPU and PyTorch installed):
```
pip install -U dfloat11[cuda12]
# or if you have CUDA version 11:
# pip install -U dfloat11[cuda11]
```
Install or upgrade the diffusers package.
```
pip install -U diffusers
```

💻 Usage Examples

Basic Usage

import torch
from diffusers import FluxFillPipeline
from diffusers.utils import load_image
from dfloat11 import DFloat11Model

image = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup.png")
mask = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup_mask.png")

pipe = FluxFillPipeline.from_pretrained("black-forest-labs/FLUX.1-Fill-dev", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

DFloat11Model.from_pretrained('DFloat11/FLUX.1-Fill-dev-DF11', device='cpu', bfloat16_model=pipe.transformer)

image = pipe(
    prompt="a white paper cup",
    image=image,
    mask_image=mask,
    height=1632,
    width=1232,
    guidance_scale=30,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save(f"flux-fill-dev.png")

📚 Documentation

Paper: 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
GitHub: https://github.com/LeanModels/DFloat11
HuggingFace: https://huggingface.co/DFloat11

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご