Shakker-Labs FLUX.1-dev-ControlNet-Union-Pro-2.0-fp8開源模型

首頁

FLUX.1 Dev ControlNet Union Pro 2.0 Fp8

由ABDALLALSWAITI開發

這是Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0模型的FP8量化版本，通過PyTorch原生FP8支持從原始BFloat16格式量化而來，優化了推理性能。

圖像生成英語開源協議:其他 #FP8量化 #多條件控制 #圖像生成優化

下載量 2,023

發布時間 : 4/19/2025

模型概述

該模型是一個文本生成圖像的ControlNet模型，支持多種控制模式如canny、soft edge、depth、pose等，可以與其他ControlNet聯合使用。

模型特點

FP8量化

使用PyTorch原生FP8支持從原始BFloat16格式量化，E4M3格式（4位指數，3位尾數），模型大小減小約50%。

多控制模式支持

支持canny、soft edge、depth、pose、gray等多種控制模式，可以與其他ControlNet聯合使用。

改進的控制力和美觀度

相比前代模型，在canny和pose上有所改進，控制力和美觀度更好，並增加了對soft edge的支持。

模型能力

文本生成圖像

圖像控制生成

多條件聯合控制

使用案例

創意設計

藝術創作

根據文本描述和控制圖像生成藝術作品。

生成具有藝術感的圖像，保持控制圖像的結構。

人像生成

人像姿勢控制

根據姿勢圖生成符合姿勢的人像。

生成符合指定姿勢的高質量人像。

🚀 FLUX.1-dev-ControlNet-Union-Pro-2.0 (FP8量化版)

本倉庫包含 Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0 模型的FP8量化版本。這不是一個微調模型，而是將原始的BFloat16模型直接量化為FP8格式，以優化推理性能。我們提供了一個在線演示。

✨ 主要特性

量化模型：將原始的BFloat16模型直接量化為FP8格式，實現推理性能優化。
資源節省：相比BFloat16/FP16模型，內存使用減少約50%，模型大小顯著減小。
推理加速：在支持FP8的硬件上，推理速度可能會提升。
質量保障：量化過程經過精心校準，輸出質量損失極小。
功能完整：保留了原始模型的所有功能，未進行微調或額外訓練。
多模式支持：支持多種控制模式，包括Canny、軟邊緣、深度、姿態、灰度等。
聯合使用：可以與其他ControlNet聯合使用。

📦 安裝指南

文檔未提及安裝步驟，故跳過此章節。

💻 使用示例

基礎用法

import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel

base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model_union_fp8 = 'ABDALLALSWAITI/FLUX.1-dev-ControlNet-Union-Pro-2.0-fp8'

# Load using FP8 data type
controlnet = FluxControlNetModel.from_pretrained(controlnet_model_union_fp8, torch_dtype=torch.float8_e4m3fn)
pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16)
pipe.to("cuda")

# replace with other conds
control_image = load_image("./conds/canny.png")
width, height = control_image.size

prompt = "A young girl stands gracefully at the edge of a serene beach, her long, flowing hair gently tousled by the sea breeze. She wears a soft, pastel-colored dress that complements the tranquil blues and greens of the coastal scenery. The golden hues of the setting sun cast a warm glow on her face, highlighting her serene expression. The background features a vast, azure ocean with gentle waves lapping at the shore, surrounded by distant cliffs and a clear, cloudless sky. The composition emphasizes the girl's serene presence amidst the natural beauty, with a balanced blend of warm and cool tones."

image = pipe(
    prompt, 
    control_image=control_image,
    width=width,
    height=height,
    controlnet_conditioning_scale=0.7,
    control_guidance_end=0.8,
    num_inference_steps=30, 
    guidance_scale=3.5,
    generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]

高級用法

import torch
from diffusers.utils import load_image

# use local files for this moment
from pipeline_flux_controlnet import FluxControlNetPipeline
from controlnet_flux import FluxControlNetModel

base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model_union_fp8 = 'ABDALLALSWAITI/FLUX.1-dev-ControlNet-Union-Pro-2.0-fp8'

# Load using FP8 data type
controlnet = FluxControlNetModel.from_pretrained(controlnet_model_union_fp8, torch_dtype=torch.float8_e4m3fn)
pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=[controlnet], torch_dtype=torch.bfloat16) # use [] to enable multi-CNs
pipe.to("cuda")

# replace with other conds
control_image = load_image("./conds/canny.png")
width, height = control_image.size

prompt = "A young girl stands gracefully at the edge of a serene beach, her long, flowing hair gently tousled by the sea breeze. She wears a soft, pastel-colored dress that complements the tranquil blues and greens of the coastal scenery. The golden hues of the setting sun cast a warm glow on her face, highlighting her serene expression. The background features a vast, azure ocean with gentle waves lapping at the shore, surrounded by distant cliffs and a clear, cloudless sky. The composition emphasizes the girl's serene presence amidst the natural beauty, with a balanced blend of warm and cool tones."

image = pipe(
    prompt, 
    control_image=[control_image, control_image], # try with different conds such as canny&depth, pose&depth
    width=width,
    height=height,
    controlnet_conditioning_scale=[0.35, 0.35],
    control_guidance_end=[0.8, 0.8],
    num_inference_steps=30, 
    guidance_scale=3.5,
    generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]

📚 詳細文檔

量化詳情

該模型使用PyTorch的原生FP8支持，從原始的BFloat16格式量化為FP8格式。具體信息如下：

量化技術：原生FP8量化
精度：E4M3格式（4位指數，3位尾數）
使用庫：PyTorch的內置FP8支持
數據類型：torch.float8_e4m3fn
原始模型：BFloat16格式（Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0）
模型大小縮減：比原始模型小約50%

FP8量化的好處包括：

減少內存使用：與BFloat16/FP16相比，模型大小約縮小50%。
加速推理：潛在的速度提升，特別是在支持FP8的硬件上。
質量損失極小：經過精心校準的量化過程，以保留輸出質量。

關鍵點

與 Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro 相比：

去除模式嵌入：模型大小更小。
改進Canny和姿態：更好的控制和美學效果。
新增軟邊緣支持：去除了對Tile的支持。

模型卡片

結構：此ControlNet由6個雙塊和0個單塊組成，去除了模式嵌入。
訓練信息：我們使用2000萬張高質量的通用和人物圖像數據集，從頭開始訓練該模型30萬步。訓練分辨率為512x512，使用BFloat16，批量大小為128，學習率為2e-5，引導值從[1, 7]中均勻採樣。我們將文本丟棄率設置為0.20。
支持模式：該模型支持多種控制模式，包括Canny、軟邊緣、深度、姿態、灰度。您可以像使用普通的ControlNet一樣使用它。
聯合使用：該模型可以與其他ControlNet聯合使用。

展示

模式	展示圖片
Canny
軟邊緣
姿態
深度
灰度

推理

文檔中已在使用示例部分展示推理代碼，此處不再重複。

多推理

文檔中已在使用示例部分展示多推理代碼，此處不再重複。

模式	工具	controlnet_conditioning_scale	control_guidance_end
Canny	cv2.Canny	0.7	0.8
軟邊緣	AnylineDetector	0.7	0.8
深度	depth-anything	0.8	0.8
姿態	DWPose	0.9	0.65
灰度	cv2.cvtColor	0.9	0.8

使用FP8模型

本倉庫包含該模型的FP8量化版本。要使用它，您需要支持FP8的PyTorch：

import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel

base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model_union_fp8 = 'ABDALLALSWAITI/FLUX.1-dev-ControlNet-Union-Pro-2.0-fp8'

# Load using FP8 data type
controlnet = FluxControlNetModel.from_pretrained(controlnet_model_union_fp8, torch_dtype=torch.float8_e4m3fn)
pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16)
pipe.to("cuda")

# The rest of the code is the same as with the original model

完整示例請參見 fp8_inference_example.py。

資源

致謝

該模型由 Shakker Labs 開發。原始想法受 xinsir/controlnet-union-sdxl-1.0 啟發。保留所有版權。

🔧 技術細節

該模型使用PyTorch的原生FP8支持進行量化，從原始的BFloat16格式轉換為FP8的E4M3格式。這種量化方式利用了PyTorch內置的功能，在減少模型大小和內存使用的同時，儘可能保留了模型的性能。通過精心調整量化參數，確保了在推理過程中質量損失極小。在支持FP8的硬件上，能夠充分發揮其優勢，實現更快的推理速度。