đ pixart-controlnet-lora-test
This project is a ControlNet PEFT LoRA derived from the base model, which aims to generate high - quality photo - realistic images, such as cat images.
đ Quick Start
This is a ControlNet PEFT LoRA derived from terminusresearch/pixart-900m-1024-ft-v0.6.
The main validation prompt used during training was:
A photo-realistic image of a cat
⨠Features
- The text encoder was not trained. You may reuse the base model text encoder for inference.
- You can find some example images in the following gallery:
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
import torch
from diffusers import PixArtSigmaPipeline, PixArtSigmaControlNetPipeline
from helpers.models.pixart.controlnet import PixArtSigmaControlNetAdapterModel
base_model_id = "terminusresearch/pixart-900m-1024-ft-v0.6"
controlnet_id = "bghira/pixart-controlnet-lora-test"
controlnet = PixArtSigmaControlNetAdapterModel.from_pretrained(
f"{controlnet_id}/controlnet"
)
pipeline = PixArtSigmaControlNetPipeline.from_pretrained(
base_model_id,
controlnet=controlnet,
torch_dtype=torch.bfloat16
)
pipeline.to('cuda' if torch.cuda.is_available() else 'cpu')
from PIL import Image
control_image = Image.open("path/to/control/image.png")
prompt = "A photo-realistic image of a cat"
image = pipeline(
prompt=prompt,
image=control_image,
num_inference_steps=16,
guidance_scale=4.0,
generator=torch.Generator(device='cuda').manual_seed(42),
controlnet_conditioning_scale=1.0,
).images[0]
image.save("output.png")
đ Documentation
Validation settings
Property |
Details |
CFG |
4.0 |
CFG Rescale |
0.0 |
Steps |
16 |
Sampler |
ddim |
Seed |
42 |
Resolution |
1024x1024 |
Note: The validation settings are not necessarily the same as the training settings.
Training settings
Property |
Details |
Training epochs |
24 |
Training steps |
150 |
Learning rate |
0.0001 Learning rate schedule: constant Warmup steps: 500 |
Max grad value |
0.01 |
Effective batch size |
1 Micro - batch size: 1 Gradient accumulation steps: 1 Number of GPUs: 1 |
Gradient checkpointing |
False |
Prediction type |
epsilon (extra parameters=['training_scheduler_timestep_spacing=trailing', 'inference_scheduler_timestep_spacing=trailing', 'controlnet_enabled']) |
Optimizer |
adamw_bf16 |
Trainable parameter precision |
Pure BF16 |
Base model precision |
no_change |
Caption dropout probability |
0.0% |
LoRA Rank |
64 |
LoRA Alpha |
64.0 |
LoRA Dropout |
0.1 |
LoRA initialisation style |
default |
Datasets - antelope-data-1024
Property |
Details |
Repeats |
0 |
Total number of images |
6 |
Total number of aspect buckets |
1 |
Resolution |
1.048576 megapixels |
Cropped |
True |
Crop style |
center |
Crop aspect |
square |
Used for regularisation data |
No |
đ§ Technical Details
The project is based on the terminusresearch/pixart-900m-1024-ft-v0.6
base model, using ControlNet PEFT LoRA technology for training. Different settings are used in the validation and training processes, and specific parameters are provided in the corresponding sections. The text encoder is not trained, and the base model text encoder can be reused for inference.
đ License
The license of this project is openrail++.