Open-source Text-to-Image Generation Model simpletuner-finetuned-sd3 - Optimizing the Quality of Pathological Image Generation

Simpletuner Finetuned Sd3

Developed by Minh-Ha

A full-rank fine-tuned text-to-image model based on sd3/unknown-model, specifically optimized for pathological image generation quality.

Image Generation Open Source License:Other #Pathological Image Generation #High-Resolution Imaging #Flow Matching Optimization

Downloads 393

Release Time : 4/29/2025

Model Overview

This is a text-to-image model based on the SD3 foundation model with full-rank fine-tuning, focusing on generating photorealistic pathological images. The model retains the base model's text encoder and only fine-tunes the image generation component.

Model Features

Pathological Image Optimization

Specifically fine-tuned for pathological images, capable of generating photorealistic medical images

High-Resolution Support

Supports image generation up to 1024x1024 resolution

BF16 Optimization

Uses BF16 precision for training and inference, balancing performance and quality

Flow Matching Prediction

Employs FlowMatch prediction type with additional parameter shift=3

Model Capabilities

Text-to-Image

Image-to-Image

High-Resolution Image Generation

Pathological Image Generation

Use Cases

Medical Imaging

Pathological Image Generation

Generate photorealistic pathological images for medical research or education

Example images are displayed in the model's gallery on the page

Creative Design

High-Resolution Art Creation

Generate high-resolution creative images using text prompts

🚀 simpletuner-finetuned-sd3

This is a full - rank finetuned model derived from sd3/unknown-model, specialized in text - to - image and image - to - image tasks.

🚀 Quick Start

This model is a full rank finetune derived from sd3/unknown-model.

The main validation prompt used during training was:

A photo-realistic pathology image

✨ Features

Multiple Tasks: Supports text - to - image and image - to - image tasks.
Customizable Settings: Allows users to adjust various parameters for validation and training.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import torch
from diffusers import DiffusionPipeline

model_id = 'Minh-Ha/simpletuner-finetuned-sd3'
pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16) # loading directly in bf16

prompt = "A photo-realistic pathology image"
negative_prompt = 'blurry, cropped, ugly'

pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu') # the pipeline is already in its target precision level
model_output = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=20,
    generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(42),
    width=1024,
    height=1024,
    guidance_scale=3.0,
).images[0]

model_output.save("output.png", format="PNG")

📚 Documentation

Validation settings

Property	Details
CFG	`3.0`
CFG Rescale	`0.0`
Steps	`20`
Sampler	`FlowMatchEulerDiscreteScheduler`
Seed	`42`
Resolution	`1024x1024`
Skip - layer guidance

Note: The validation settings are not necessarily the same as the training settings.

You can find some example images in the following gallery:

The text encoder was not trained. You may reuse the base model text encoder for inference.

Training settings

Property	Details
Training epochs	1
Training steps	5000
Learning rate	5e - 06
Learning rate schedule	polynomial
Warmup steps	100
Max grad value	2.0
Effective batch size	4
Micro - batch size	1
Gradient accumulation steps	4
Number of GPUs	1
Gradient checkpointing	True
Prediction type	flow_matching (extra parameters=['shift=3'])
Optimizer	adamw_bf16
Trainable parameter precision	Pure BF16
Base model precision	`no_change`
Caption dropout probability	0.1%

Datasets

images - 512

Property	Details
Repeats	1
Total number of images	3061
Total number of aspect buckets	1
Resolution	0.262144 megapixels
Cropped	True
Crop style	center
Crop aspect	square
Used for regularisation data	No

images - 768

Property	Details
Repeats	1
Total number of images	2242
Total number of aspect buckets	1
Resolution	0.589824 megapixels
Cropped	True
Crop style	center
Crop aspect	square
Used for regularisation data	No

images - 1024

Property	Details
Repeats	1
Total number of images	1449
Total number of aspect buckets	1
Resolution	1.048576 megapixels
Cropped	True
Crop style	center
Crop aspect	square
Used for regularisation data	No

📄 License

The license for this model is 'other'.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご