Pixart Open-Source Image Generation Model - Freely Achieve Text-to-Image and Image-to-Image Conversion

Pixart

Developed by ControlNetLoRA

A ControlNet PEFT LoRA model based on PixArt-900M for image generation, supporting text-to-image and image-to-image conversion.

Image Generation #ControlNet Image Generation #LoRA Fine-tuning #High-resolution Images

Downloads 893

Release Time : 6/15/2025

Model Overview

This project is a ControlNet PEFT LoRA model based on PixArt-900M, mainly used for image generation tasks, capable of achieving high-quality text-to-image and image-to-image conversion.

Model Features

ControlNet PEFT LoRA Technology

Adopt ControlNet PEFT LoRA technology to improve the image generation effect.

Multi-mode Support

Support multiple generation modes of text-to-image and image-to-image.

Efficient Training

Provide detailed training and validation settings for easy reproduction and adjustment.

Model Capabilities

Text-to-Image Generation

Image-to-Image Conversion

Use Cases

Image Generation

Generate a Realistic Image of a Cat

Use the prompt 'A photo-realistic image of a cat' to generate a high-quality image of a cat.

Example images can be viewed in the gallery.

🚀 pixart-controlnet-lora-test

This project is a ControlNet PEFT LoRA derived from the base model, which aims to generate high - quality photo - realistic images, such as cat images.

🚀 Quick Start

This is a ControlNet PEFT LoRA derived from terminusresearch/pixart-900m-1024-ft-v0.6.

The main validation prompt used during training was:

A photo-realistic image of a cat

✨ Features

The text encoder was not trained. You may reuse the base model text encoder for inference.
You can find some example images in the following gallery:

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

import torch
from diffusers import PixArtSigmaPipeline, PixArtSigmaControlNetPipeline
# if you're not in the SimpleTuner environment, this import will fail.
from helpers.models.pixart.controlnet import PixArtSigmaControlNetAdapterModel

# Load base model
base_model_id = "terminusresearch/pixart-900m-1024-ft-v0.6"
controlnet_id = "bghira/pixart-controlnet-lora-test"

# Load ControlNet adapter
controlnet = PixArtSigmaControlNetAdapterModel.from_pretrained(
    f"{controlnet_id}/controlnet"
)

# Create pipeline
pipeline = PixArtSigmaControlNetPipeline.from_pretrained(
    base_model_id,
    controlnet=controlnet,
    torch_dtype=torch.bfloat16
)
pipeline.to('cuda' if torch.cuda.is_available() else 'cpu')

# Load your control image
from PIL import Image
control_image = Image.open("path/to/control/image.png")

# Generate
prompt = "A photo-realistic image of a cat"
image = pipeline(
    prompt=prompt,
    image=control_image,
    num_inference_steps=16,
    guidance_scale=4.0,
    generator=torch.Generator(device='cuda').manual_seed(42),
    controlnet_conditioning_scale=1.0,
).images[0]

image.save("output.png")

📚 Documentation

Validation settings

Property	Details
CFG	4.0
CFG Rescale	0.0
Steps	16
Sampler	ddim
Seed	42
Resolution	1024x1024

Note: The validation settings are not necessarily the same as the training settings.

Training settings

Property	Details
Training epochs	24
Training steps	150
Learning rate	0.0001 Learning rate schedule: constant Warmup steps: 500
Max grad value	0.01
Effective batch size	1 Micro - batch size: 1 Gradient accumulation steps: 1 Number of GPUs: 1
Gradient checkpointing	False
Prediction type	epsilon (extra parameters=['training_scheduler_timestep_spacing=trailing', 'inference_scheduler_timestep_spacing=trailing', 'controlnet_enabled'])
Optimizer	adamw_bf16
Trainable parameter precision	Pure BF16
Base model precision	no_change
Caption dropout probability	0.0%
LoRA Rank	64
LoRA Alpha	64.0
LoRA Dropout	0.1
LoRA initialisation style	default

Datasets - antelope-data-1024

Property	Details
Repeats	0
Total number of images	6
Total number of aspect buckets	1
Resolution	1.048576 megapixels
Cropped	True
Crop style	center
Crop aspect	square
Used for regularisation data	No

🔧 Technical Details

The project is based on the terminusresearch/pixart-900m-1024-ft-v0.6 base model, using ControlNet PEFT LoRA technology for training. Different settings are used in the validation and training processes, and specific parameters are provided in the corresponding sections. The text encoder is not trained, and the base model text encoder can be reused for inference.

📄 License

The license of this project is openrail++.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご