🚀 quilt-1m-finetuned-sd3.5
This project is a full rank finetune derived from sd3/unknown-model. It focuses on text - to - image generation, offering high - quality image outputs with specific validation and training settings.
📄 License
The license for this project is other
.
✨ Features
- Multiple Modes: Supports text - to - image and image - to - image generation.
- Fine - Tuned Model: Derived from sd3/unknown-model with specific training settings.
- Gallery of Examples: Provides example images for reference.
📦 Installation
No installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
import torch
from diffusers import DiffusionPipeline
model_id = 'Minh-Ha/quilt-1m-finetuned-sd3.5'
pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
prompt = "A photo-realistic pathology image"
negative_prompt = 'blurry, cropped, ugly'
pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
model_output = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=20,
generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(42),
width=1024,
height=1024,
guidance_scale=3.0,
).images[0]
model_output.save("output.png", format="PNG")
📚 Documentation
Validation settings
- CFG:
3.0
- CFG Rescale:
0.0
- Steps:
20
- Sampler:
FlowMatchEulerDiscreteScheduler
- Seed:
42
- Resolution:
1024x1024
- Skip - layer guidance:
Note: The validation settings are not necessarily the same as the training settings.
You can find some example images in the following gallery:
The text encoder was not trained. You may reuse the base model text encoder for inference.
Training settings
Property |
Details |
Training epochs |
0 |
Training steps |
10000 |
Learning rate |
5e - 06 - Learning rate schedule: polynomial - Warmup steps: 100 |
Max grad value |
2.0 |
Effective batch size |
16 - Micro - batch size: 1 - Gradient accumulation steps: 4 - Number of GPUs: 4 |
Gradient checkpointing |
True |
Prediction type |
flow_matching (extra parameters=['shift=3']) |
Optimizer |
adamw_bf16 |
Trainable parameter precision |
Pure BF16 |
Base model precision |
no_change |
Caption dropout probability |
0.1% |
Datasets
Dataset |
Repeats |
Total number of images |
Total number of aspect buckets |
Resolution |
Cropped |
Crop style |
Crop aspect |
Used for regularisation data |
images - 512 |
1 |
~417748 |
1 |
0.262144 megapixels |
True |
random |
square |
No |
images - 768 |
1 |
~266740 |
1 |
0.589824 megapixels |
True |
random |
square |
No |
images - 1024 |
1 |
~246816 |
1 |
1.048576 megapixels |
True |
random |
square |
No |