๐ Playground v2.5 โ 1024px Aesthetic Model
This repository houses a model capable of generating high - aesthetic images with a resolution of 1024x1024, supporting both portrait and landscape aspect ratios. You can utilize this model with Hugging Face ๐งจ Diffusers.

Playground v2.5 is a diffusion - based text - to - image generative model, succeeding Playground v2. It stands as the state - of - the - art open - source model in terms of aesthetic quality. User studies have shown that it outperforms SDXL, Playground v2, PixArt - ฮฑ, DALL - E 3, and Midjourney 5.2.
For in - depth details about the model's development and training, refer to our blog post and technical report.
โจ Features
- Generates high - aesthetic 1024x1024 images in various aspect ratios.
- Diffusion - based text - to - image generative model.
- Outperforms multiple state - of - the - art models in aesthetic quality.
๐ฆ Installation
Install diffusers >= 0.27.0 and the relevant dependencies:
pip install diffusers>=0.27.0
pip install transformers accelerate safetensors
๐ป Usage Examples
Basic Usage
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"playgroundai/playground-v2.5-1024px-aesthetic",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, num_inference_steps=50, guidance_scale=3).images[0]
Advanced Usage
Using the model with Automatic1111/ComfyUI: Support is coming soon. We will update this model card with instructions when ready.
๐ Documentation
Model Description
Property |
Details |
Developed by |
Playground |
Model Type |
Diffusion - based text - to - image generative model |
License |
Playground v2.5 Community License |
Summary |
This model generates images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pre - trained text encoders (OpenCLIP - ViT/G and CLIP - ViT/L). It follows the same architecture as Stable Diffusion XL. |
User Studies
This model card only offers a brief summary of our user study results. For comprehensive details on how we conduct user studies, check out our technical report.
We carried out studies to measure overall aesthetic quality, as well as for specific areas targeted for improvement in Playground v2.5, namely multi - aspect ratios and human preference alignment.
Comparison to State - of - the - Art

The aesthetic quality of Playground v2.5 significantly surpasses current state - of - the - art open - source models like SDXL and PIXART - ฮฑ, as well as Playground v2. Due to the large performance gap between Playground V2.5 and SDXL, we also compared its aesthetic quality against world - class closed - source models such as DALL - E 3 and Midjourney 5.2, and found that Playground v2.5 outperforms them too.
Multi Aspect Ratios

Similarly, in terms of multi - aspect ratios, Playground v2.5 outperforms SDXL by a wide margin.
Human Preference Alignment on People - related images

We benchmarked Playground v2.5 specifically on people - related images to test human preference alignment. It was compared against two commonly - used baseline models: SDXL and RealStock v2, a community fine - tune of SDXL trained on a realistic people dataset. Playground v2.5 outperforms both baselines significantly.
MJHQ - 30K Benchmark

We reported metrics using our MJHQ - 30K benchmark, which we open - sourced with the v2 release. Both overall FID and per - category FID are reported. All FID metrics are computed at a resolution of 1024x1024. Our results show that Playground v2.5 outperforms both Playground v2 and SDXL in overall FID and all category FIDs, especially in the people and fashion categories. This aligns with the user study results, indicating a correlation between human preferences and the FID score of the MJHQ - 30K benchmark.
๐ License
This model is released under the Playground v2.5 Community License.
๐ง Technical Details
For details on the development and training of our model, please refer to our blog post and technical report.
๐ How to cite us
@misc{li2024playground,
title={Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation},
author={Daiqing Li and Aleks Kamko and Ehsan Akhgari and Ali Sabet and Linmiao Xu and Suhail Doshi},
year={2024},
eprint={2402.17245},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
โ ๏ธ Important Note
The pipeline uses the EDMDPMSolverMultistepScheduler
scheduler by default, for crisper fine details. It's an EDM formulation of the DPM++ 2M Karras scheduler. guidance_scale = 3.0
is a good default for this scheduler. The pipeline also supports the EDMEulerScheduler
scheduler. It's an EDM formulation of the Euler scheduler. guidance_scale = 5.0
is a good default for this scheduler.
๐ก Usage Tip
You can optionally use the DPM++ 2M Karras scheduler for crisper fine details. Just uncomment the relevant code in the usage example.