đ Model Card for pseudo-flex-base (1024x1024 base resolution)
This is a fine - tuned model based on stable - diffusion - 2 - 1. It has been adjusted to handle different aspect ratios, evolving into a photography - oriented model. It aims to generate high - quality images with better aspect - ratio adaptability.

đ Quick Start
Use the code below to get started with the model.
import torch
from diffusers import StableDiffusionPipeline, DiffusionPipeline, AutoencoderKL, UNet2DConditionModel, DDPMScheduler
from transformers import CLIPTextModel
model_id = 'ptx0/pseudo-flex-base'
pipeline = DiffusionPipeline.from_pretrained(model_id)
pipeline.unet = torch.compile(pipeline.unet)
scheduler = DDPMScheduler.from_pretrained(
model_id,
subfolder="scheduler"
)
torch.set_float32_matmul_precision('high')
pipeline.to('cuda')
prompts = {
"woman": "a woman, hanging out on the beach",
"man": "a man playing guitar in a park",
"lion": "Explore the ++majestic beauty++ of untamed ++lion prides++ as they roam the African plains --captivating expressions-- in the wildest national geographic adventure",
"child": "a child flying a kite on a sunny day",
"bear": "best quality ((bear)) in the swiss alps cinematic 8k highly detailed sharp focus intricate fur",
"alien": "an alien exploring the Mars surface",
"robot": "a robot serving coffee in a cafe",
"knight": "a knight protecting a castle",
"menn": "a group of smiling and happy men",
"bicycle": "a bicycle, on a mountainside, on a sunny day",
"cosmic": "cosmic entity, sitting in an impossible position"
}
⨠Features
- Fine - tuned for Aspect Ratios: The model is fine - tuned from stable - diffusion - 2 - 1 to handle various aspect ratios, making it suitable for different image generation needs.
- Diverse Training Data: Trained on a diverse dataset including cushman, midjourney v5.1 - filtered, national geographic, and more, enhancing its generation capabilities.
đĻ Installation
This model can be used via the diffusers
library. You need to have Pytorch 2 installed. The installation of the necessary libraries can be achieved through the following steps:
- Install Pytorch 2 according to your CUDA version.
- Install the
diffusers
library:
pip install diffusers
- Install other required libraries such as
transformers
and torch
if not already installed.
đģ Usage Examples
Basic Usage
import torch
from diffusers import StableDiffusionPipeline, DiffusionPipeline, AutoencoderKL, UNet2DConditionModel, DDPMScheduler
from transformers import CLIPTextModel
model_id = 'ptx0/pseudo-flex-base'
pipeline = DiffusionPipeline.from_pretrained(model_id)
pipeline.unet = torch.compile(pipeline.unet)
scheduler = DDPMScheduler.from_pretrained(
model_id,
subfolder="scheduler"
)
torch.set_float32_matmul_precision('high')
pipeline.to('cuda')
prompts = {
"woman": "a woman, hanging out on the beach",
"man": "a man playing guitar in a park",
"lion": "Explore the ++majestic beauty++ of untamed ++lion prides++ as they roam the African plains --captivating expressions-- in the wildest national geographic adventure",
"child": "a child flying a kite on a sunny day",
"bear": "best quality ((bear)) in the swiss alps cinematic 8k highly detailed sharp focus intricate fur",
"alien": "an alien exploring the Mars surface",
"robot": "a robot serving coffee in a cafe",
"knight": "a knight protecting a castle",
"menn": "a group of smiling and happy men",
"bicycle": "a bicycle, on a mountainside, on a sunny day",
"cosmic": "cosmic entity, sitting in an impossible position"
}
đ Documentation
Model Details
Model Description
This is a diffusion - based text - to - image generation model, fine - tuned from stable - diffusion - 2 - 1 for dynamic aspect ratios.
|
width |
height |
aspect ratio |
images |
0 |
1024 |
1024 |
1:1 |
90561 |
1 |
1536 |
1024 |
3:2 |
8716 |
2 |
1365 |
1024 |
4:3 |
6933 |
3 |
1468 |
1024 |
~3:2 |
113 |
4 |
1778 |
1024 |
~5:3 |
6315 |
5 |
1200 |
1024 |
~5:4 |
6376 |
6 |
1333 |
1024 |
~4:3 |
2814 |
7 |
1281 |
1024 |
~5:4 |
52 |
8 |
1504 |
1024 |
~3:2 |
139 |
9 |
1479 |
1024 |
~3:2 |
25 |
10 |
1384 |
1024 |
~4:3 |
1676 |
11 |
1370 |
1024 |
~4:3 |
63 |
12 |
1499 |
1024 |
~3:2 |
436 |
13 |
1376 |
1024 |
~4:3 |
68 |
Other aspects were in smaller buckets.
- Developed by: pseudoterminal
- Model type: Diffusion - based text - to - image generation model
- Language(s): English
- License: creativeml - openrail - m
- Parent Model: https://huggingface.co/ptx0/pseudo - real - beta
- Resources for more information: More information needed
Uses
- see https://huggingface.co/stabilityai/stable - diffusion - 2 - 1
Training Details
Training Data
- LAION HD dataset subsets
- https://huggingface.co/datasets/laion/laion - high - resolution
We only used a small portion of that, see Preprocessing
Preprocessing
All pre - processing is done via the scripts in bghira/SimpleTuner
on GitHub.
Speeds, Sizes, Times
- Dataset size: 100k image - caption pairs, after filtering.
- Hardware: 1 A100 80G GPUs
- Optimizer: 8bit Adam
- Batch size: 150
- actual batch size: 15
- gradient_accumulation_steps: 10
- effective batch size: 150
- Learning rate: Constant 4e - 8 which was adjusted by reducing batch size over time.
- Training steps: WIP (ongoing)
- Training time: approximately 4 days (so far)
đ§ Technical Details
Training Process
- Initial Fine - Tuning: The
pseudo - flex - base
model was created by fine - tuning the base stabilityai/stable - diffusion - 2 - 1
768 model on its frozen text encoder for 1000 steps on 148,000 images from LAION HD using the TEXT field as their caption.
- Text Encoder Swap: At 1000 steps, the text encoder from
ptx0/pseudo - real - beta
was used with this model's unet to resolve some residual image noise.
- Dataset Changes: Due to image degradation and overfitting issues, the training dataset was changed multiple times. First, it was changed to a new subset of high - resolution Midjourney v5.1 data at 1300 steps. Then, a new LAION subset with unique images and specific aspect ratios was used.
- Contrast Fix: Offset noise and SNR gamma were applied experimentally to the checkpoint 4250 to fix the contrast issue.
Aspect Bucketing
The training loop dataloader was modified to support aspect bucketing. Images under 1024x1024 were discarded, and all images were conditioned so that the smaller side of the image is 1024.
đ License
This model is licensed under the creativeml - openrail - m license.