FLUX.1-dev Open-source Model - Free High-quality Text-to-Image Conversion!

FLUX.1 Dev

Developed by ControlNetLoRA

This is a ControlNet PEFT LoRA model based on black-forest-labs/flux.1-dev, capable of achieving high-quality text-to-image conversion.

Image Generation Open Source License:Other #ControlNet Fine-tuning #BF16 Efficient Inference #Low-resolution Image Generation

Downloads 309

Release Time : 6/17/2025

Model Overview

Through specific training and settings, this model can achieve text-to-image conversion and generate high-quality images.

Model Features

Based on ControlNet PEFT LoRA Technology

Adopting ControlNet PEFT LoRA technology, derived from the black-forest-labs/flux.1-dev model, optimizing training efficiency and model performance.

High-quality Image Generation

Through specific training and settings, it can generate high-quality images.

Text Encoder Reuse

The text encoder is not trained and can directly reuse the text encoder of the base model for inference, reducing training costs and computational resource consumption.

Model Capabilities

Text-to-Image Generation

Image-to-Image Conversion

Use Cases

Creative Design

Generate Photorealistic Images

Generate high-quality photorealistic images based on text prompts, such as photos of cats.

Generate images with a resolution of 256x256.

🚀 flux-controlnet-lora-test

This is a ControlNet PEFT LoRA derived from black-forest-labs/flux.1-dev, designed for text-to-image and image-to-image tasks.

🚀 Quick Start

This is a ControlNet PEFT LoRA derived from black-forest-labs/flux.1-dev.

The main validation prompt used during training was:

A photo-realistic image of a cat

✨ Features

Supports text-to-image and image-to-image tasks.
Utilizes ControlNet and LoRA for enhanced performance.

📦 Installation

The installation is mainly about setting up the necessary Python environment and loading the model and adapter. You need to have Python and relevant deep learning libraries installed. The specific installation steps are shown in the inference code.

💻 Usage Examples

Basic Usage

import torch
from diffusers import DiffusionPipeline

model_id = 'black-forest-labs/flux.1-dev'
adapter_id = 'bghira/flux-controlnet-lora-test'
pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16) # loading directly in bf16
pipeline.load_lora_weights(adapter_id)

prompt = "A photo-realistic image of a cat"


## Optional: quantise the model to save on vram.
## Note: The model was quantised during training, and so it is recommended to do the same during inference time.
from optimum.quanto import quantize, freeze, qint8
quantize(pipeline.transformer, weights=qint8)
freeze(pipeline.transformer)
    
pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu') # the pipeline is already in its target precision level
model_output = pipeline(
    prompt=prompt,
    num_inference_steps=16,
    generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(42),
    width=256,
    height=256,
    guidance_scale=4.0,
).images[0]

model_output.save("output.png", format="PNG")

📚 Documentation

Validation settings

Property	Details
CFG	`4.0`
CFG Rescale	`0.0`
Steps	`16`
Sampler	`FlowMatchEulerDiscreteScheduler`
Seed	`42`
Resolution	`256x256`
Skip-layer guidance	-

Note: The validation settings are not necessarily the same as the training settings.

You can find some example images in the following gallery:

The text encoder was not trained. You may reuse the base model text encoder for inference.

Training settings

Property	Details
Training epochs	8
Training steps	250
Learning rate	0.0001
Learning rate schedule	constant
Warmup steps	500
Max grad value	2.0
Effective batch size	1
Micro-batch size	1
Gradient accumulation steps	1
Number of GPUs	1
Gradient checkpointing	True
Prediction type	flow_matching (extra parameters=['shift=3.0', 'flux_guidance_mode=constant', 'flux_guidance_value=1.0', 'flux_lora_target=controlnet'])
Optimizer	adamw_bf16
Trainable parameter precision	Pure BF16
Base model precision	`int8-quanto`
Caption dropout probability	0.0%
LoRA Rank	64
LoRA Alpha	64.0
LoRA Dropout	0.1
LoRA initialisation style	default

Datasets - antelope-data-256

Property	Details
Repeats	0
Total number of images	29
Total number of aspect buckets	1
Resolution	0.065536 megapixels
Cropped	True
Crop style	center
Crop aspect	square
Used for regularisation data	No

🔧 Technical Details

The model is a ControlNet PEFT LoRA derived from black-forest-labs/flux.1-dev. During training, specific validation prompts and settings were used. The text encoder was not trained, and the base model text encoder can be reused for inference. The training settings involve multiple parameters such as learning rate, batch size, and optimizer. The dataset used for training has specific characteristics like the number of images and resolution.

📄 License

License: other

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご