playground-v2-1024px-aesthetic Open Source Diffusion Model - Freely Generate High-Aesthetic 1024x1024 Images

Playground V2 1024px Aesthetic

Developed by playgroundai

A diffusion model capable of generating high-aesthetic 1024x1024 resolution images, with user preference 2.5 times higher than Stable Diffusion XL

Image Generation Open Source License:Other #High Aesthetic Image Generation #1024px HD Resolution #Dual Text Encoder Architecture

Downloads 3,822

Release Time : 12/5/2023

Model Overview

A text-to-image generation model based on diffusion principles, specializing in producing high-aesthetic 1024x1024 resolution images

Model Features

High-Resolution Generation

Optimized for generating 1024x1024 high-resolution images

Exceptional Aesthetic Quality

User research shows aesthetic preference is 2.5 times higher than SDXL

Dual Text Encoder

Utilizes both OpenCLIP-ViT/G and CLIP-ViT/L text encoders

Transparent Research

Releases intermediate checkpoints at different training stages to facilitate research

Model Capabilities

Text-to-Image Generation

High-Resolution Image Generation

Aesthetic Image Generation

Use Cases

Creative Design

Concept Art Creation

Generate high-quality concept art images from text descriptions

Produces artworks with rich details and aesthetic value

Commercial Applications

Advertising Material Generation

Quickly generate advertising visuals aligned with brand identity

High-quality images ready for marketing campaigns

🚀 Playground v2 – 1024px Aesthetic Model

This repository contains a model that generates highly aesthetic 1024x1024 resolution images. It can be used with Hugging Face 🧨 Diffusers.

image/png

🚀 Quick Start

Playground v2 is a diffusion-based text-to-image generative model, trained from scratch by the research team at Playground. Images generated by Playground v2 are favored 2.5 times more than those produced by Stable Diffusion XL, as per Playground’s user study. The team is excited to release intermediate checkpoints at different training stages, including evaluation metrics, to the community, aiming to encourage further research into foundational models for image generation. Additionally, a new benchmark, MJHQ-30K, is introduced for automatic evaluation of a model’s aesthetic quality. For more details, please visit our blog.

✨ Features

High Aesthetic Quality: Generates highly aesthetic 1024x1024 resolution images.
Based on Diffusion Technology: A diffusion-based text-to-image generative model.
Multiple Evaluation Metrics: Released intermediate checkpoints with evaluation metrics, and introduced a new benchmark for aesthetic quality evaluation.

📦 Installation

Install diffusers >= 0.24.0 and some dependencies:

pip install transformers accelerate safetensors

💻 Usage Examples

Basic Usage

To use the model, run the following snippet.

Note: It is recommend to use guidance_scale=3.0.

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "playgroundai/playground-v2-1024px-aesthetic",
    torch_dtype=torch.float16,
    use_safetensors=True,
    add_watermarker=False,
    variant="fp16"
)
pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image  = pipe(prompt=prompt, guidance_scale=3.0).images[0]

Advanced Usage

In order to use the model with software such as Automatic1111 or ComfyUI you can use playground-v2.fp16.safetensors file.

📚 Documentation

Model Description

Property	Details
Developed by	Playground
Model Type	Diffusion-based text-to-image generative model
License	Playground v2 Community License
Summary	This model generates images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pre-trained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). It follows the same architecture as Stable Diffusion XL.

User Study

image/png

According to user studies conducted by Playground, involving over 2,600 prompts and thousands of users, the images generated by Playground v2 are favored 2.5 times more than those produced by Stable Diffusion XL. The user preference metrics are reported on PartiPrompts and an internal prompt dataset curated by the Playground team. The “Internal 1K” prompt dataset is diverse and covers various categories and tasks. During the user study, users are instructed to evaluate image pairs based on both their aesthetic preference and the image-text alignment.

MJHQ-30K Benchmark

image/png

Model	Overall FID
SDXL-1-0-refiner	9.55
playground-v2-1024px-aesthetic	7.07

A new benchmark, MJHQ-30K, is introduced for automatic evaluation of a model’s aesthetic quality. A high-quality dataset is curated from Midjourney, featuring 10 common categories, with each category containing 3,000 samples. Aesthetic score and CLIP score are used to ensure high image quality and high image-text alignment, and extra care is taken to make the data diverse within each category. For Playground v2, both the overall FID and per-category FID are reported at resolution 1024x1024. The benchmark results show that the model outperforms SDXL-1-0-refiner in overall FID and all category FIDs, especially in people and fashion categories, which is consistent with the user study results. The benchmark is released to the public for the community to benchmark their models’ aesthetic quality.

Intermediate Base Models

Model	FID	Clip Score
SDXL-1-0-refiner	13.04	32.62
playground-v2-256px-base	9.83	31.90
playground-v2-512px-base	9.55	32.08

Apart from playground-v2-1024px-aesthetic, intermediate checkpoints at different training stages are released to the community to promote foundation model research in pixels. The FID score and CLIP score on the MSCOCO14 evaluation set are reported for reference.

How to cite us

@misc{playground-v2,
      url={[https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic)},
      title={Playground v2},
      author={Li, Daiqing and Kamko, Aleks and Sabet, Ali and Akhgari, Ehsan and Xu, Linmiao and Doshi, Suhail}
}

📄 License

This model is released under the Playground v2 Community License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご