sd35m-photo-1mp-Prodigy Open Source Model - Generate Photorealistic Images for Free!

Sd35m Photo 1mp Prodigy

Developed by bghira

LyCORIS adapter based on stabilityai/stable-diffusion-3.5-medium, specializing in photorealistic image generation

Image Generation Open Source License:Other #Photorealistic #LyCORIS Adapter #Prodigy Optimization

Downloads 144

Release Time : 1/24/2025

Model Overview

This is a LyCORIS adapter based on the Stable Diffusion 3.5 Medium model, specifically optimized for generating high-quality photorealistic images, particularly of animals.

Model Features

Photorealistic Generation

Specially optimized for generating high-quality photorealistic images, particularly of animals

LyCORIS Adapter

Fine-tuned using LyCORIS technology to enhance domain-specific generation capabilities while maintaining base model performance

Efficient Training Configuration

Utilizes int8-quanto quantization and BF16 precision training to optimize GPU memory usage

Layer-Specific Guidance

Employs skip-layer guidance technique (layers [7,8,9]) to enhance generation quality

Model Capabilities

Text-to-Image Generation

Photorealistic Image Generation

Animal Image Generation

Use Cases

Creative Content Generation

Pet Photo Generation

Generate high-quality, photorealistic pet images

Can be used for social media content, pet-related product displays, etc.

Digital Art Creation

Provides artists with photorealistic creative materials

Can generate animal images in specific styles as artistic foundations

Commercial Applications

Advertising Material Generation

Quickly generate advertising materials for pet-related products

Saves photography costs and provides diverse visual content

🚀 sd35m-photo-1mp-Prodigy

This is a LyCORIS adapter for text - to - image generation, derived from stabilityai/stable-diffusion-3.5-medium, offering high - quality photo - realistic image outputs.

🚀 Quick Start

This is a LyCORIS adapter derived from stabilityai/stable-diffusion-3.5-medium.

The main validation prompt used during training was:

A photo-realistic image of a cat

✨ Features

Text - to - Image: Capable of generating photo - realistic images from text prompts.
LyCORIS Adapter: Based on the LyCORIS technology, enhancing the base model's performance.

📚 Documentation

Validation settings

CFG: 3.2
CFG Rescale: 0.0
Steps: 16
Sampler: FlowMatchEulerDiscreteScheduler
Seed: 42
Resolution: 1024x1024
Skip - layer guidance: skip_guidance_layers=[7, 8, 9],

⚠️ Important Note

The validation settings are not necessarily the same as the training settings.

The text encoder was not trained. You may reuse the base model text encoder for inference.

Training settings

Property	Details
Training epochs	114
Training steps	230
Learning rate	5e - 05 Learning rate schedule: constant Warmup steps: 500
Max grad value	2.0
Effective batch size	3 Micro - batch size: 3 Gradient accumulation steps: 1 Number of GPUs: 1
Gradient checkpointing	True
Prediction type	flow_matching (extra parameters=['shift=3.0'])
Optimizer	optimi - lion
Trainable parameter precision	Pure BF16
Base model precision	`int8 - quanto`
Caption dropout probability	0.0%

LyCORIS Config:

{
    "bypass_mode": true,
    "algo": "lokr",
    "multiplier": 1.0,
    "full_matrix": true,
    "linear_dim": 10000,
    "linear_alpha": 1,
    "factor": 4,
    "apply_preset": {
        "target_module": [
            "Attention",
            "FeedForward"
        ],
        "module_algo_map": {
            "FeedForward": {
                "factor": 4
            },
            "Attention": {
                "factor": 2
            }
        }
    }
}

Datasets

cheechandchong

Property	Details
Repeats	0
Total number of images	4
Total number of aspect buckets	1
Resolution	1024 px
Cropped	True
Crop style	random
Crop aspect	square
Used for regularisation data	No

💻 Usage Examples

Basic Usage

import torch
from diffusers import DiffusionPipeline
from lycoris import create_lycoris_from_weights


def download_adapter(repo_id: str):
    import os
    from huggingface_hub import hf_hub_download
    adapter_filename = "pytorch_lora_weights.safetensors"
    cache_dir = os.environ.get('HF_PATH', os.path.expanduser('~/.cache/huggingface/hub/models'))
    cleaned_adapter_path = repo_id.replace("/", "_").replace("\\", "_").replace(":", "_")
    path_to_adapter = os.path.join(cache_dir, cleaned_adapter_path)
    path_to_adapter_file = os.path.join(path_to_adapter, adapter_filename)
    os.makedirs(path_to_adapter, exist_ok=True)
    hf_hub_download(
        repo_id=repo_id, filename=adapter_filename, local_dir=path_to_adapter
    )

    return path_to_adapter_file
    
model_id = 'stabilityai/stable-diffusion-3.5-medium'
adapter_repo_id = 'bghira/sd35m-photo-1mp-Prodigy'
adapter_filename = 'pytorch_lora_weights.safetensors'
adapter_file_path = download_adapter(repo_id=adapter_repo_id)
pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16) # loading directly in bf16
lora_scale = 1.0
wrapper, _ = create_lycoris_from_weights(lora_scale, adapter_file_path, pipeline.transformer)
wrapper.merge_to()

prompt = "A photo-realistic image of a cat"
negative_prompt = 'ugly, cropped, blurry, low-quality, mediocre average'

## Optional: quantise the model to save on vram.
## Note: The model was quantised during training, and so it is recommended to do the same during inference time.
from optimum.quanto import quantize, freeze, qint8
quantize(pipeline.transformer, weights=qint8)
freeze(pipeline.transformer)
    
pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu') # the pipeline is already in its target precision level
model_output = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=16,
    generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(42),
    width=1024,
    height=1024,
    guidance_scale=3.2,
    skip_guidance_layers=[7, 8, 9],
).images[0]

model_output.save("output.png", format="PNG")

Exponential Moving Average (EMA)

SimpleTuner generates a safetensors variant of the EMA weights and a pt file.

The safetensors file is intended to be used for inference, and the pt file is for continuing finetuning.

The EMA model may provide a more well - rounded result, but typically will feel undertrained compared to the full model as it is a running decayed average of the model weights.

📄 License

License: other

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご