Pseudo-flex-base Open-source Photography Model - Fine-tuned on SD2.1, Supporting Multi-aspect Ratio Image Generation

Pseudo Flex Base

Developed by bghira

A multi-ratio photography model fine-tuned based on Stable Diffusion 2.1, supporting dynamic resolution image generation

Image Generation Open Source License:Openrail #Multi-ratio photography #High-resolution generation #Realistic style

Downloads 70

Release Time : 6/25/2023

Model Overview

This is a multi-ratio photography model fine-tuned from stable-diffusion-2-1, specifically optimized for generating non-standard aspect ratio images, addressing the quality anomalies in traditional models when producing wide/vertical formats.

Model Features

Multi-ratio support

Optimizes image generation quality for non-square ratios (e.g., 16:9, 4:3) through aspect ratio bucketing technology

High-resolution generation

Base resolution of 1024x1024 with support for higher resolution image generation

Contrast optimization

Utilizes offset noise and SNR gamma techniques to improve image contrast issues

Diverse dataset

Incorporates multi-source high-quality data including Kodak color slides, Midjourney images, and National Geographic content

Model Capabilities

Text-to-image generation

High-resolution image generation

Multi-ratio image generation

Realistic style image generation

Use Cases

Photography art

Portrait photography

Generate high-quality portrait photos in various ratios

Capable of producing natural portraits in different ratios (1:1, 4:3, 16:9, etc.)

Landscape photography

Generate wide-format natural scenery images

Ideal for creating landscape photos in wide formats like 16:9

Creative design

Advertising materials

Generate images that meet various advertising layout requirements

Supports generating advertising materials in different ratios

🚀 Model Card for pseudo-flex-base (1024x1024 base resolution)

This is a fine - tuned model based on stable - diffusion - 2 - 1. It has been adjusted to handle different aspect ratios, evolving into a photography - oriented model. It aims to generate high - quality images with better aspect - ratio adaptability.

🚀 Quick Start

Use the code below to get started with the model.

# Use Pytorch 2!
import torch
from diffusers import StableDiffusionPipeline, DiffusionPipeline, AutoencoderKL, UNet2DConditionModel, DDPMScheduler
from transformers import CLIPTextModel

# Any model currently on Huggingface Hub.
model_id = 'ptx0/pseudo-flex-base'
pipeline = DiffusionPipeline.from_pretrained(model_id)

# Optimize!
pipeline.unet = torch.compile(pipeline.unet)
scheduler = DDPMScheduler.from_pretrained(
    model_id,
    subfolder="scheduler"
)

# Remove this if you get an error.
torch.set_float32_matmul_precision('high')

pipeline.to('cuda')
prompts = {
    "woman": "a woman, hanging out on the beach",
    "man": "a man playing guitar in a park",
    "lion": "Explore the ++majestic beauty++ of untamed ++lion prides++ as they roam the African plains --captivating expressions-- in the wildest national geographic adventure",
    "child": "a child flying a kite on a sunny day",
    "bear": "best quality ((bear)) in the swiss alps cinematic 8k highly detailed sharp focus intricate fur",
    "alien": "an alien exploring the Mars surface",
    "robot": "a robot serving coffee in a cafe",
    "knight": "a knight protecting a castle",
    "menn": "a group of smiling and happy men",
    "bicycle": "a bicycle, on a mountainside, on a sunny day",
    "cosmic": "cosmic entity, sitting in an impossible position"
}

✨ Features

Fine - tuned for Aspect Ratios: The model is fine - tuned from stable - diffusion - 2 - 1 to handle various aspect ratios, making it suitable for different image generation needs.
Diverse Training Data: Trained on a diverse dataset including cushman, midjourney v5.1 - filtered, national geographic, and more, enhancing its generation capabilities.

📦 Installation

This model can be used via the diffusers library. You need to have Pytorch 2 installed. The installation of the necessary libraries can be achieved through the following steps:

Install Pytorch 2 according to your CUDA version.
Install the diffusers library:

pip install diffusers

Install other required libraries such as transformers and torch if not already installed.

💻 Usage Examples

Basic Usage

# Use Pytorch 2!
import torch
from diffusers import StableDiffusionPipeline, DiffusionPipeline, AutoencoderKL, UNet2DConditionModel, DDPMScheduler
from transformers import CLIPTextModel

# Any model currently on Huggingface Hub.
model_id = 'ptx0/pseudo-flex-base'
pipeline = DiffusionPipeline.from_pretrained(model_id)

# Optimize!
pipeline.unet = torch.compile(pipeline.unet)
scheduler = DDPMScheduler.from_pretrained(
    model_id,
    subfolder="scheduler"
)

# Remove this if you get an error.
torch.set_float32_matmul_precision('high')

pipeline.to('cuda')
prompts = {
    "woman": "a woman, hanging out on the beach",
    "man": "a man playing guitar in a park",
    "lion": "Explore the ++majestic beauty++ of untamed ++lion prides++ as they roam the African plains --captivating expressions-- in the wildest national geographic adventure",
    "child": "a child flying a kite on a sunny day",
    "bear": "best quality ((bear)) in the swiss alps cinematic 8k highly detailed sharp focus intricate fur",
    "alien": "an alien exploring the Mars surface",
    "robot": "a robot serving coffee in a cafe",
    "knight": "a knight protecting a castle",
    "menn": "a group of smiling and happy men",
    "bicycle": "a bicycle, on a mountainside, on a sunny day",
    "cosmic": "cosmic entity, sitting in an impossible position"
}

📚 Documentation

Model Details

Model Description

This is a diffusion - based text - to - image generation model, fine - tuned from stable - diffusion - 2 - 1 for dynamic aspect ratios.

	width	height	aspect ratio	images
0	1024	1024	1:1	90561
1	1536	1024	3:2	8716
2	1365	1024	4:3	6933
3	1468	1024	~3:2	113
4	1778	1024	~5:3	6315
5	1200	1024	~5:4	6376
6	1333	1024	~4:3	2814
7	1281	1024	~5:4	52
8	1504	1024	~3:2	139
9	1479	1024	~3:2	25
10	1384	1024	~4:3	1676
11	1370	1024	~4:3	63
12	1499	1024	~3:2	436
13	1376	1024	~4:3	68

Other aspects were in smaller buckets.

Developed by: pseudoterminal
Model type: Diffusion - based text - to - image generation model
Language(s): English
License: creativeml - openrail - m
Parent Model: https://huggingface.co/ptx0/pseudo - real - beta
Resources for more information: More information needed

Uses

see https://huggingface.co/stabilityai/stable - diffusion - 2 - 1

Training Details

Training Data

LAION HD dataset subsets
- https://huggingface.co/datasets/laion/laion - high - resolution We only used a small portion of that, see Preprocessing

Preprocessing

All pre - processing is done via the scripts in bghira/SimpleTuner on GitHub.

Speeds, Sizes, Times

Dataset size: 100k image - caption pairs, after filtering.
Hardware: 1 A100 80G GPUs
Optimizer: 8bit Adam
Batch size: 150
- actual batch size: 15
- gradient_accumulation_steps: 10
- effective batch size: 150
Learning rate: Constant 4e - 8 which was adjusted by reducing batch size over time.
Training steps: WIP (ongoing)
Training time: approximately 4 days (so far)

🔧 Technical Details

Training Process

Initial Fine - Tuning: The pseudo - flex - base model was created by fine - tuning the base stabilityai/stable - diffusion - 2 - 1 768 model on its frozen text encoder for 1000 steps on 148,000 images from LAION HD using the TEXT field as their caption.
Text Encoder Swap: At 1000 steps, the text encoder from ptx0/pseudo - real - beta was used with this model's unet to resolve some residual image noise.
Dataset Changes: Due to image degradation and overfitting issues, the training dataset was changed multiple times. First, it was changed to a new subset of high - resolution Midjourney v5.1 data at 1300 steps. Then, a new LAION subset with unique images and specific aspect ratios was used.
Contrast Fix: Offset noise and SNR gamma were applied experimentally to the checkpoint 4250 to fix the contrast issue.

Aspect Bucketing

The training loop dataloader was modified to support aspect bucketing. Images under 1024x1024 were discarded, and all images were conditioned so that the smaller side of the image is 1024.

📄 License

This model is licensed under the creativeml - openrail - m license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご