đ hidream-controlnet-lora-test
This project presents a ControlNet PEFT LoRA derived from HiDream-ai/HiDream-I1-Full. It offers capabilities in text - to - image and image - to - image generation, leveraging the power of diffusers and LoRA techniques.
đ Quick Start
Prerequisites
Before using this project, ensure you have the necessary libraries installed. You can install them using pip
or other package managers.
Inference
The following is an example of how to perform inference with this model:
import torch
from diffusers import DiffusionPipeline
model_id = 'HiDream-ai/HiDream-I1-Full'
adapter_id = 'bghira/hidream-controlnet-lora-test'
pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
pipeline.load_lora_weights(adapter_id)
prompt = "A photo-realistic image of a cat"
negative_prompt = 'ugly, cropped, blurry, low-quality, mediocre average'
from optimum.quanto import quantize, freeze, qint8
quantize(pipeline.transformer, weights=qint8)
freeze(pipeline.transformer)
pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
model_output = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=16,
generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(42),
width=256,
height=256,
guidance_scale=4.0,
).images[0]
model_output.save("output.png", format="PNG")
⨠Features
- Based on ControlNet PEFT LoRA: Derived from HiDream-ai/HiDream-I1-Full, it provides enhanced control over image generation.
- Multiple Generation Modes: Supports text - to - image and image - to - image generation.
- Flexible Configuration: Allows users to adjust various parameters during training and inference.
đĻ Installation
The installation process mainly involves installing the necessary Python libraries. You can use the following command to install the required libraries:
pip install diffusers torch optimum
đģ Usage Examples
Basic Usage
The code in the Quick Start section demonstrates the basic usage of this model for text - to - image generation.
Advanced Usage
You can adjust the parameters such as prompt
, negative_prompt
, num_inference_steps
, etc., to generate different images according to your needs.
đ Documentation
Validation settings
- CFG:
4.0
- CFG Rescale:
0.0
- Steps:
16
- Sampler:
FlowMatchEulerDiscreteScheduler
- Seed:
42
- Resolution:
256x256
Note: The validation settings are not necessarily the same as the training settings.
You can find some example images in the following gallery:
The text encoder was not trained. You may reuse the base model text encoder for inference.
Training settings
Property |
Details |
Training epochs |
0 |
Training steps |
2 |
Learning rate |
0.0001 |
Learning rate schedule |
constant |
Warmup steps |
500 |
Max grad value |
2.0 |
Effective batch size |
1 |
Micro - batch size |
1 |
Gradient accumulation steps |
1 |
Number of GPUs |
1 |
Gradient checkpointing |
True |
Prediction type |
flow_matching (extra parameters=['shift=3.0']) |
Optimizer |
adamw_bf16 |
Trainable parameter precision |
Pure BF16 |
Base model precision |
int8 - quanto |
Caption dropout probability |
0.0% |
LoRA Rank |
1 |
LoRA Alpha |
1.0 |
LoRA Dropout |
0.1 |
LoRA initialisation style |
default |
Datasets
antelope - data - 256
Property |
Details |
Repeats |
0 |
Total number of images |
29 |
Total number of aspect buckets |
1 |
Resolution |
0.065536 megapixels |
Cropped |
True |
Crop style |
center |
Crop aspect |
square |
Used for regularisation data |
No |
đ License
This project is under the 'other' license.