🚀 Playground v2 – 1024px Aesthetic Model
This repository contains a model that generates highly aesthetic 1024x1024 resolution images. It can be used with Hugging Face 🧨 Diffusers.

🚀 Quick Start
Playground v2 is a diffusion-based text-to-image generative model, trained from scratch by the research team at Playground. Images generated by Playground v2 are favored 2.5 times more than those produced by Stable Diffusion XL, as per Playground’s user study. The team is excited to release intermediate checkpoints at different training stages, including evaluation metrics, to the community, aiming to encourage further research into foundational models for image generation. Additionally, a new benchmark, MJHQ-30K, is introduced for automatic evaluation of a model’s aesthetic quality. For more details, please visit our blog.
✨ Features
- High Aesthetic Quality: Generates highly aesthetic 1024x1024 resolution images.
- Based on Diffusion Technology: A diffusion-based text-to-image generative model.
- Multiple Evaluation Metrics: Released intermediate checkpoints with evaluation metrics, and introduced a new benchmark for aesthetic quality evaluation.
📦 Installation
Install diffusers >= 0.24.0 and some dependencies:
pip install transformers accelerate safetensors
💻 Usage Examples
Basic Usage
To use the model, run the following snippet.
Note: It is recommend to use guidance_scale=3.0
.
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"playgroundai/playground-v2-1024px-aesthetic",
torch_dtype=torch.float16,
use_safetensors=True,
add_watermarker=False,
variant="fp16"
)
pipe.to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, guidance_scale=3.0).images[0]
Advanced Usage
In order to use the model with software such as Automatic1111 or ComfyUI you can use playground-v2.fp16.safetensors
file.
📚 Documentation
Model Description
User Study

According to user studies conducted by Playground, involving over 2,600 prompts and thousands of users, the images generated by Playground v2 are favored 2.5 times more than those produced by Stable Diffusion XL. The user preference metrics are reported on PartiPrompts and an internal prompt dataset curated by the Playground team. The “Internal 1K” prompt dataset is diverse and covers various categories and tasks. During the user study, users are instructed to evaluate image pairs based on both their aesthetic preference and the image-text alignment.
MJHQ-30K Benchmark

A new benchmark, MJHQ-30K, is introduced for automatic evaluation of a model’s aesthetic quality. A high-quality dataset is curated from Midjourney, featuring 10 common categories, with each category containing 3,000 samples. Aesthetic score and CLIP score are used to ensure high image quality and high image-text alignment, and extra care is taken to make the data diverse within each category. For Playground v2, both the overall FID and per-category FID are reported at resolution 1024x1024. The benchmark results show that the model outperforms SDXL-1-0-refiner in overall FID and all category FIDs, especially in people and fashion categories, which is consistent with the user study results. The benchmark is released to the public for the community to benchmark their models’ aesthetic quality.
Intermediate Base Models
Apart from playground-v2-1024px-aesthetic, intermediate checkpoints at different training stages are released to the community to promote foundation model research in pixels. The FID score and CLIP score on the MSCOCO14 evaluation set are reported for reference.
How to cite us
@misc{playground-v2,
url={[https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic)},
title={Playground v2},
author={Li, Daiqing and Kamko, Aleks and Sabet, Ali and Akhgari, Ehsan and Xu, Linmiao and Doshi, Suhail}
}
📄 License
This model is released under the Playground v2 Community License.