🚀 Playground v2 – 512px Base Model
This repository holds a base (pre - train) model capable of generating 512x512 resolution images.
Primarily designed for research, this model doesn't typically produce highly aesthetic images.
You can utilize this model with Hugging Face 🧨 Diffusers.

Playground v2 is a diffusion - based text - to - image generative model. It was trained from scratch by the research team at Playground.
According to Playground’s user study, images generated by Playground v2 are favored 2.5 times more than those from Stable Diffusion XL.
We're excited to release intermediate checkpoints at different training stages, along with evaluation metrics, to the community. We hope this will spur further research on foundational models for image generation.
Lastly, we introduce a new benchmark, MJHQ - 30K, for automatically evaluating a model’s aesthetic quality.
For more details, please visit our blog.
🚀 Quick Start
This model can be used with Hugging Face 🧨 Diffusers. First, install the necessary dependencies:
pip install transformers accelerate safetensors
Then, use the following code to run the model:
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"playgroundai/playground-v2-512px-base",
torch_dtype=torch.float16,
use_safetensors=True,
add_watermarker=False,
variant="fp16",
)
pipe.to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, width=512, height=512).images[0]
✨ Features
- Text - to - Image Generation: Generate images based on text prompts.
- High - Quality Results: According to user studies, its generated images are favored 2.5 times more than those of Stable Diffusion XL.
- New Benchmark: Introduce the MJHQ - 30K benchmark for evaluating aesthetic quality.
- Intermediate Checkpoints: Release intermediate checkpoints at different training stages to promote research.
📦 Installation
Install diffusers >= 0.24.0 and some dependencies:
pip install transformers accelerate safetensors
💻 Usage Examples
Basic Usage
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"playgroundai/playground-v2-512px-base",
torch_dtype=torch.float16,
use_safetensors=True,
add_watermarker=False,
variant="fp16",
)
pipe.to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, width=512, height=512).images[0]
📚 Documentation
Model Description
User Study

According to user studies by Playground, with over 2,600 prompts and thousands of users, images generated by Playground v2 are favored 2.5 times more than those from Stable Diffusion XL.
We report user preference metrics on PartiPrompts and an internal prompt dataset curated by the Playground team. The “Internal 1K” prompt dataset is diverse, covering various categories and tasks.
During the user study, users were instructed to evaluate image pairs based on (1) aesthetic preference and (2) image - text alignment.
MJHQ - 30K Benchmark

We introduce the MJHQ - 30K benchmark for automatically evaluating a model’s aesthetic quality. It computes FID on a high - quality dataset to measure aesthetic quality.
We curated a high - quality dataset from Midjourney, with 10 common categories and 3,000 samples per category. We used aesthetic score and CLIP score to ensure high image quality and image - text alignment, and made the data diverse within each category.
For Playground v2, we report both overall FID and per - category FID, all computed at 1024x1024 resolution. Our benchmark results show that our model outperforms SDXL - 1 - 0 - refiner in overall FID and all category FIDs, especially in people and fashion categories. This aligns with the user study results, indicating a correlation between human preference and FID score on the MJHQ - 30K benchmark.
We release this benchmark to the public and encourage the community to use it for benchmarking models' aesthetic quality.
Intermediate Base Models
Apart from playground - v2 - 1024px - aesthetic, we release intermediate checkpoints at different training stages to the community to promote foundation model research in pixels. Here, we report FID and CLIP scores on the MSCOCO14 evaluation set for reference. (Note that our reported numbers may differ from SDXL's published results due to different prompt lists.)
📄 License
This model is released under the Playground v2 Community License.
How to cite us
@misc{playground-v2,
url={[https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic)},
title={Playground v2},
author={Li, Daiqing and Kamko, Aleks and Sabet, Ali and Akhgari, Ehsan and Xu, Lin and Doshi, Suhail}
}