đ simpletuner-controlnet-sdxl-lora-test
This project presents a ControlNet PEFT LoHa derived from stabilityai/stable-diffusion-xl-base-1.0, which can generate photo - realistic images, such as images of cats.
đ Quick Start
This is a ControlNet PEFT LoHa derived from stabilityai/stable-diffusion-xl-base-1.0.
The main validation prompt used during training was:
A photo-realistic image of a cat
⨠Features
- The text encoder was not trained. You may reuse the base model text encoder for inference.
- It can generate photo - realistic images according to the given prompts.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
import torch
from diffusers import DiffusionPipeline
model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
adapter_id = 'bghira/simpletuner-controlnet-sdxl-lora-test'
pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
pipeline.load_lora_weights(adapter_id)
prompt = "A photo-realistic image of a cat"
negative_prompt = 'blurry, cropped, ugly'
pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
model_output = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=20,
generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(42),
width=1024,
height=1024,
guidance_scale=4.2,
guidance_rescale=0.0,
).images[0]
model_output.save("output.png", format="PNG")
đ Documentation
Validation settings
- CFG:
4.2
- CFG Rescale:
0.0
- Steps:
20
- Sampler:
ddim
- Seed:
42
- Resolution:
1024x1024
Note: The validation settings are not necessarily the same as the training settings.
You can find some example images in the following gallery:
Training settings
Property |
Details |
Training epochs |
4 |
Training steps |
100 |
Learning rate |
0.0001 - Learning rate schedule: constant - Warmup steps: 0 |
Max grad value |
2.0 |
Effective batch size |
1 - Micro - batch size: 1 - Gradient accumulation steps: 1 - Number of GPUs: 1 |
Gradient checkpointing |
True |
Prediction type |
epsilon (extra parameters=['training_scheduler_timestep_spacing=trailing', 'inference_scheduler_timestep_spacing=trailing']) |
Optimizer |
bnb - lion8bit |
Trainable parameter precision |
Pure BF16 |
Base model precision |
no_change |
Caption dropout probability |
0.1% |
LoRA Rank |
128 |
LoRA Alpha |
128.0 |
LoRA Dropout |
0.1 |
LoRA initialisation style |
default |
Datasets
antelope - data
Property |
Details |
Repeats |
0 |
Total number of images |
24 |
Total number of aspect buckets |
1 |
Resolution |
1.048576 megapixels |
Cropped |
True |
Crop style |
center |
Crop aspect |
square |
Used for regularisation data |
No |
đ§ Technical Details
The model is a ControlNet PEFT LoHa derived from stabilityai/stable-diffusion-xl-base-1.0. It has specific training settings, validation settings, and uses a particular dataset for training. The text encoder is not trained, and users can reuse the base model text encoder for inference.
đ License
The license is creativeml-openrail-m
.