đ sd35m-reflow
This project presents a standard PEFT LoRA derived from stabilityai/stable-diffusion-3.5-medium. It offers a solution for text - to - image and image - to - image tasks, enabling users to generate high - quality images with specific settings.
đ Quick Start
To start using this model, you can follow the inference code example provided below. It demonstrates how to load the base model, the adapter, and generate an image.
⨠Features
- Derived from a well - known base model stabilityai/stable-diffusion-3.5-medium.
- Offers specific validation and training settings for better control over image generation.
- Allows for inference with a provided Python code example.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
import torch
from diffusers import DiffusionPipeline
model_id = 'stabilityai/stable-diffusion-3.5-medium'
adapter_id = 'bghira/sd35m-reflow'
pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
pipeline.load_lora_weights(adapter_id)
prompt = "A photo-realistic image of a cat"
negative_prompt = 'blurry, cropped, ugly'
pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
model_output = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=8,
generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(42),
width=1024,
height=1024,
guidance_scale=1.0,
skip_guidance_layers=[],
).images[0]
model_output.save("output.png", format="PNG")
đ Documentation
Validation settings
Property |
Details |
CFG |
1.0 |
CFG Rescale |
0.0 |
Steps |
8 |
Sampler |
FlowMatchEulerDiscreteScheduler |
Seed |
42 |
Resolution |
1024x1024 |
Skip - layer guidance |
skip_guidance_layers=[] |
Note: The validation settings are not necessarily the same as the training settings.
You can find some example images in the following gallery:
The text encoder was not trained. You may reuse the base model text encoder for inference.
Training settings
Property |
Details |
Training epochs |
1 |
Training steps |
500 |
Learning rate |
0.0001 - Learning rate schedule: constant_with_warmup - Warmup steps: 500 |
Max grad value |
0.1 |
Effective batch size |
32 - Micro - batch size: 4 - Gradient accumulation steps: 1 - Number of GPUs: 8 |
Gradient checkpointing |
True |
Prediction type |
flow_matching (extra parameters=['flow_schedule_auto_shift', 'shift=0.0']) |
Optimizer |
adamw_bf16 |
Trainable parameter precision |
Pure BF16 |
Base model precision |
no_change |
Caption dropout probability |
10.0% |
LoRA Rank |
16 |
LoRA Alpha |
None |
LoRA Dropout |
0.1 |
LoRA initialisation style |
default |
Datasets - photo10k
Property |
Details |
Repeats |
0 |
Total number of images |
~10040 |
Total number of aspect buckets |
2 |
Resolution |
1.048576 megapixels |
Cropped |
False |
Crop style |
None |
Crop aspect |
None |
Used for regularisation data |
No |
đ§ Technical Details
This section provides detailed information about the validation, training settings, and datasets used in the project. The validation settings define the parameters for validating the model, while the training settings cover aspects such as learning rate, batch size, and optimizer. The dataset section describes the characteristics of the photo10k dataset used in the project.
đ License
The license is specified as "other".