playground-v2.5-1024px-aesthetic Open-Source Text-to-Image Model - Free Generation of High-Aesthetic-Quality and High-Resolution Images

Playground V2.5 1024px Aesthetic

Developed by playgroundai

An open-source text-to-image model capable of generating aesthetic images at 1024x1024 resolution and various aspect ratios, leading in aesthetic quality within the open-source domain.

Image Generation Open Source License:Other #High Aesthetic Image Generation #Multi-Aspect Ratio Adaptation #Dual Text Encoders

Downloads 554.94k

Release Time : 2/16/2024

Model Overview

A text-to-image model based on diffusion principles, an upgraded version of Playground v2, utilizing latent diffusion architecture with dual text encoders, supporting image generation at multiple aspect ratios.

Model Features

Exceptional Aesthetic Quality

User studies indicate it outperforms SDXL, Playground v2, PixArt-α, DALL-E 3, and Midjourney 5.2.

Multi-Aspect Ratio Generation

Supports image generation at various aspect ratios, significantly outperforming SDXL in multi-aspect ratio image generation.

Optimized Scheduler

Uses EDMDPMSolverMultistepScheduler by default with a recommended guidance scale of 3.0, also supports EDMEulerScheduler.

Model Capabilities

High-Resolution Image Generation

Multi-Aspect Ratio Support

Text-to-Image Conversion

Aesthetic Image Generation

Use Cases

Creative Design

Concept Art Creation

Generate high-quality concept art based on text descriptions

Produces detailed, aesthetically pleasing artworks

Advertising Material Generation

Quickly generate high-quality visual materials for commercial advertising

Creates advertising images that meet commercial aesthetic standards

Entertainment Content

Game Asset Creation

Generate visual elements like characters and scenes for game development

Produces high-quality, stylistically consistent game assets

🚀 Playground v2.5 – 1024px Aesthetic Model

This repository houses a model capable of generating high - aesthetic images with a resolution of 1024x1024, supporting both portrait and landscape aspect ratios. You can utilize this model with Hugging Face 🧨 Diffusers.

image/png

Playground v2.5 is a diffusion - based text - to - image generative model, succeeding Playground v2. It stands as the state - of - the - art open - source model in terms of aesthetic quality. User studies have shown that it outperforms SDXL, Playground v2, PixArt - α, DALL - E 3, and Midjourney 5.2.

For in - depth details about the model's development and training, refer to our blog post and technical report.

✨ Features

Generates high - aesthetic 1024x1024 images in various aspect ratios.
Diffusion - based text - to - image generative model.
Outperforms multiple state - of - the - art models in aesthetic quality.

📦 Installation

Install diffusers >= 0.27.0 and the relevant dependencies:

pip install diffusers>=0.27.0
pip install transformers accelerate safetensors

💻 Usage Examples

Basic Usage

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "playgroundai/playground-v2.5-1024px-aesthetic",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

# # Optional: Use DPM++ 2M Karras scheduler for crisper fine details
# from diffusers import EDMDPMSolverMultistepScheduler
# pipe.scheduler = EDMDPMSolverMultistepScheduler()

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, num_inference_steps=50, guidance_scale=3).images[0]

Advanced Usage

Using the model with Automatic1111/ComfyUI: Support is coming soon. We will update this model card with instructions when ready.

📚 Documentation

Model Description

Property	Details
Developed by	Playground
Model Type	Diffusion - based text - to - image generative model
License	Playground v2.5 Community License
Summary	This model generates images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pre - trained text encoders (OpenCLIP - ViT/G and CLIP - ViT/L). It follows the same architecture as Stable Diffusion XL.

User Studies

This model card only offers a brief summary of our user study results. For comprehensive details on how we conduct user studies, check out our technical report.

We carried out studies to measure overall aesthetic quality, as well as for specific areas targeted for improvement in Playground v2.5, namely multi - aspect ratios and human preference alignment.

Comparison to State - of - the - Art

image/png

The aesthetic quality of Playground v2.5 significantly surpasses current state - of - the - art open - source models like SDXL and PIXART - α, as well as Playground v2. Due to the large performance gap between Playground V2.5 and SDXL, we also compared its aesthetic quality against world - class closed - source models such as DALL - E 3 and Midjourney 5.2, and found that Playground v2.5 outperforms them too.

Multi Aspect Ratios

image/png

Similarly, in terms of multi - aspect ratios, Playground v2.5 outperforms SDXL by a wide margin.

Human Preference Alignment on People - related images

image/png

We benchmarked Playground v2.5 specifically on people - related images to test human preference alignment. It was compared against two commonly - used baseline models: SDXL and RealStock v2, a community fine - tune of SDXL trained on a realistic people dataset. Playground v2.5 outperforms both baselines significantly.

MJHQ - 30K Benchmark

image/png

Model	Overall FID
SDXL - 1 - 0 - refiner	9.55
playground - v2 - 1024px - aesthetic	7.07
playground - v2.5 - 1024px - aesthetic	4.48

We reported metrics using our MJHQ - 30K benchmark, which we open - sourced with the v2 release. Both overall FID and per - category FID are reported. All FID metrics are computed at a resolution of 1024x1024. Our results show that Playground v2.5 outperforms both Playground v2 and SDXL in overall FID and all category FIDs, especially in the people and fashion categories. This aligns with the user study results, indicating a correlation between human preferences and the FID score of the MJHQ - 30K benchmark.

📄 License

This model is released under the Playground v2.5 Community License.

🔧 Technical Details

For details on the development and training of our model, please refer to our blog post and technical report.

📖 How to cite us

@misc{li2024playground,
      title={Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation}, 
      author={Daiqing Li and Aleks Kamko and Ehsan Akhgari and Ali Sabet and Linmiao Xu and Suhail Doshi},
      year={2024},
      eprint={2402.17245},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

⚠️ Important Note

The pipeline uses the EDMDPMSolverMultistepScheduler scheduler by default, for crisper fine details. It's an EDM formulation of the DPM++ 2M Karras scheduler. guidance_scale = 3.0 is a good default for this scheduler. The pipeline also supports the EDMEulerScheduler scheduler. It's an EDM formulation of the Euler scheduler. guidance_scale = 5.0 is a good default for this scheduler.

💡 Usage Tip

You can optionally use the DPM++ 2M Karras scheduler for crisper fine details. Just uncomment the relevant code in the usage example.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご