🚀 🐱 PixArt-Σ Model Card
PixArt-Σ is a powerful text - to - image generative model. It can directly generate high - resolution images like 1024px, 2K, and 4K from text prompts in a single sampling process, offering great potential for various image - generation research.

📦 Installation
⚠️ Important Note
Make sure to upgrade diffusers to >= 0.28.0:
pip install -U diffusers --upgrade
In addition make sure to install transformers
, safetensors
, sentencepiece
, and accelerate
:
pip install transformers accelerate safetensors sentencepiece
For diffusers<0.28.0
, check this script for help.
💻 Usage Examples
Basic Usage
import torch
from diffusers import Transformer2DModel, PixArtSigmaPipeline
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
weight_dtype = torch.float16
pipe = PixArtSigmaPipeline.from_pretrained(
"PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
torch_dtype=weight_dtype,
use_safetensors=True,
)
pipe.to(device)
prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt).images[0]
image.save("./catcus.png")
Advanced Usage
When using torch >= 2.0
, you can improve the inference speed by 20 - 30% with torch.compile. Simple wrap the unet with torch compile before running the pipeline:
pipe.transformer = torch.compile(pipe.transformer, mode="reduce-overhead", fullgraph=True)
If you are limited by GPU VRAM, you can enable cpu offloading by calling pipe.enable_model_cpu_offload
instead of .to("cuda")
:
- pipe.to("cuda")
+ pipe.enable_model_cpu_offload()
For more information on how to use PixArt-Σ with diffusers
, please have a look at the PixArt-Σ Docs.
📚 Documentation
Model

PixArt-Σ consists of pure transformer blocks for latent diffusion:
It can directly generate 1024px, 2K and 4K images from text prompts within a single sampling process.
Source code is available at https://github.com/PixArt-alpha/PixArt-sigma.
Model Description
Model Sources
For research purposes, we recommend our generative-models
Github repository (https://github.com/PixArt-alpha/PixArt-sigma),
which is more suitable for both training and inference and for which most advanced diffusion sampler like SA-Solver will be added over time.
Hugging Face provides free PixArt-Σ inference.
- Repository: https://github.com/PixArt-alpha/PixArt-sigma
- Demo: https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
Uses
Direct Use
The model is intended for research purposes only. Possible research areas and tasks include:
- Generation of artworks and use in design and other artistic processes.
- Applications in educational or creative tools.
- Research on generative models.
- Safe deployment of models which have the potential to generate harmful content.
- Probing and understanding the limitations and biases of generative models.
Excluded uses are described below.
Out-of-Scope Use
The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
Limitations and Bias
Limitations
- The model does not achieve perfect photorealism.
- The model cannot render legible text.
- The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”.
- Fingers, etc. in general may not be generated properly.
- The autoencoding part of the model is lossy.
Bias
While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
📄 License
The model is under the CreativeML Open RAIL++-M License.