Playground v2 Open-Source Text-to-Image Model - Free Support for Image Generation Research, Non-Highly Aestheticized Images

Playground V2 512px Base

Developed by playgroundai

Playground v2 is a text-to-image generation model based on diffusion principles, trained from scratch by the Playground research team primarily for research purposes, typically unable to produce highly aesthetic images.

Image Generation Open Source License:Other #512px Text-to-Image #Diffusion Model #Research-Grade Foundation Model

Downloads 70

Release Time : 11/30/2023

Model Overview

This model generates 512x512 resolution images from text prompts. It is a latent diffusion model using two fixed pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L), with the same architecture as Stable Diffusion XL.

Model Features

High-Quality Image Generation

User studies show that user preference for images generated by Playground v2 is 2.5 times higher than Stable Diffusion XL.

Research-Friendly

Released intermediate checkpoints from different training stages, including evaluation metrics, to facilitate research on image generation foundation models.

New Evaluation Benchmark

Introduced the MJHQ-30K benchmark for automatic evaluation of model aesthetic quality through FID scores on high-quality datasets.

Model Capabilities

Text-to-Image Generation

512x512 Resolution Image Generation

Use Cases

Creative Design

Concept Art Creation

Generate creative concept art images from text descriptions

Can produce diverse creative images, such as 'an astronaut in the jungle' and other scenarios

Research Applications

Diffusion Model Research

Used as a foundation model for research on image generation technologies

Provides checkpoints from different training stages and evaluation metrics

🚀 Playground v2 – 512px Base Model

This repository holds a base (pre - train) model capable of generating 512x512 resolution images.

Primarily designed for research, this model doesn't typically produce highly aesthetic images.

You can utilize this model with Hugging Face 🧨 Diffusers.

image/png

Playground v2 is a diffusion - based text - to - image generative model. It was trained from scratch by the research team at Playground.

According to Playground’s user study, images generated by Playground v2 are favored 2.5 times more than those from Stable Diffusion XL.

We're excited to release intermediate checkpoints at different training stages, along with evaluation metrics, to the community. We hope this will spur further research on foundational models for image generation.

Lastly, we introduce a new benchmark, MJHQ - 30K, for automatically evaluating a model’s aesthetic quality.

For more details, please visit our blog.

🚀 Quick Start

This model can be used with Hugging Face 🧨 Diffusers. First, install the necessary dependencies:

pip install transformers accelerate safetensors

Then, use the following code to run the model:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "playgroundai/playground-v2-512px-base",
    torch_dtype=torch.float16,
    use_safetensors=True,
    add_watermarker=False,
    variant="fp16",
)
pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, width=512, height=512).images[0]

✨ Features

Text - to - Image Generation: Generate images based on text prompts.
High - Quality Results: According to user studies, its generated images are favored 2.5 times more than those of Stable Diffusion XL.
New Benchmark: Introduce the MJHQ - 30K benchmark for evaluating aesthetic quality.
Intermediate Checkpoints: Release intermediate checkpoints at different training stages to promote research.

📦 Installation

Install diffusers >= 0.24.0 and some dependencies:

pip install transformers accelerate safetensors

💻 Usage Examples

Basic Usage

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "playgroundai/playground-v2-512px-base",
    torch_dtype=torch.float16,
    use_safetensors=True,
    add_watermarker=False,
    variant="fp16",
)
pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, width=512, height=512).images[0]

📚 Documentation

Model Description

Property	Details
Developed by	Playground
Model Type	Diffusion - based text - to - image generative model
License	Playground v2 Community License
Summary	This model generates images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pre - trained text encoders (OpenCLIP - ViT/G and CLIP - ViT/L). It follows the same architecture as Stable Diffusion XL.

User Study

image/png

According to user studies by Playground, with over 2,600 prompts and thousands of users, images generated by Playground v2 are favored 2.5 times more than those from Stable Diffusion XL.

We report user preference metrics on PartiPrompts and an internal prompt dataset curated by the Playground team. The “Internal 1K” prompt dataset is diverse, covering various categories and tasks.

During the user study, users were instructed to evaluate image pairs based on (1) aesthetic preference and (2) image - text alignment.

MJHQ - 30K Benchmark

image/png

Model	Overall FID
SDXL - 1 - 0 - refiner	9.55
playground - v2 - 1024px - aesthetic	7.07

We introduce the MJHQ - 30K benchmark for automatically evaluating a model’s aesthetic quality. It computes FID on a high - quality dataset to measure aesthetic quality.

We curated a high - quality dataset from Midjourney, with 10 common categories and 3,000 samples per category. We used aesthetic score and CLIP score to ensure high image quality and image - text alignment, and made the data diverse within each category.

For Playground v2, we report both overall FID and per - category FID, all computed at 1024x1024 resolution. Our benchmark results show that our model outperforms SDXL - 1 - 0 - refiner in overall FID and all category FIDs, especially in people and fashion categories. This aligns with the user study results, indicating a correlation between human preference and FID score on the MJHQ - 30K benchmark.

We release this benchmark to the public and encourage the community to use it for benchmarking models' aesthetic quality.

Intermediate Base Models

Model	FID	Clip Score
SDXL - 1 - 0 - refiner	13.04	32.62
playground - v2 - 256px - base	9.83	31.90
playground - v2 - 512px - base	9.55	32.08

Apart from playground - v2 - 1024px - aesthetic, we release intermediate checkpoints at different training stages to the community to promote foundation model research in pixels. Here, we report FID and CLIP scores on the MSCOCO14 evaluation set for reference. (Note that our reported numbers may differ from SDXL's published results due to different prompt lists.)

📄 License

This model is released under the Playground v2 Community License.

How to cite us

@misc{playground-v2,
      url={[https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic)},
      title={Playground v2},
      author={Li, Daiqing and Kamko, Aleks and Sabet, Ali and Akhgari, Ehsan and Xu, Lin and Doshi, Suhail}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご