GLIGEN-XL-1024 Open-Source Model - Collaborate with SDXL for Text-to-Image Generation and Provide a HuggingFace-Style Pipeline

Gligen Xl 1024

Developed by jiuntian

A GLIGEN adapter supporting the SDXL version, providing HuggingFace-style pipelines for text-to-image generation tasks.

Text-to-Image Open Source License:Apache-2.0 #SDXL Adapter #Text-to-Image Generation #Object Positioning Control

Downloads 1,265

Release Time : 1/19/2025

Model Overview

This project open-sources the pre-trained weights of the GLIGEN adapter for SDXL, along with diffusers pipelines and training code, supporting object positioning control in text-to-image generation tasks.

Model Features

SDXL Support

Provides GLIGEN adapter support for Stable Diffusion XL (SDXL), expanding the model's application scope.

Object Positioning Control

Precisely control the position of objects in generated images using the gligen_boxes parameter.

Diffusers Integration

Offers HuggingFace diffusers-style pipelines for easy integration and use.

Model Capabilities

Text-to-Image Generation

Object Positioning Control

High-Resolution Image Generation (1024x1024)

Use Cases

Creative Design

Scene Generation

Generate scene images with specific objects and layouts, such as a dog on a grassland.

Can generate high-quality images at 1024x1024 resolution

Advertising Design

Ad Material Generation

Generate advertising material images based on product descriptions and layout requirements.

Allows precise control over product positioning in images

🚀 IGLIGEN-XL

This project aims to support a SDXL version of GLIGEN adapters with a Hugging Face-style pipeline, contributing to the creation of InteractDiffusion XL.

📦 Information

Property	Details
Model Type	SDXL version of GLIGEN adapters
Training Data	- jiuntian/sa1b-sdxl-latents-1024 - jiuntian/sa-1b_boxes_sdxl
Base Model	stabilityai/stable-diffusion-xl-base-1.0
Pipeline Tag	text-to-image
Library Name	diffusers
License	Apache-2.0

🚀 Quick Start

This project aims to support a SDXL version of GLIGEN adapters, with a Hugging Face-style pipeline. The project is part of the effort in creating InteractDiffusion XL. More details can be found at the Github Repo.

✨ Features

IGLIGEN reproduces GLIGEN on diffusers frameworks and simplifies the training procedure. They have released the code and pretrained weights for SD v1.4/v1.5, SD v2.0/v2.1, but the support for SDXL is highly anticipated. This repo open-sources the pretrained weights of the GLIGEN adapter for SDXL, along with the diffusers pipeline and training code. We appreciate the work of the authors of GLIGEN and IGLIGEN.

💻 Usage Examples

Basic Usage

import torch
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    "jiuntian/gligen-xl-1024", trust_remote_code=True, torch_dtype=torch.float16
).to("cuda")

prompt = "An image of grassland with a dog."

# Image generation with GLIGEN
output_images = pipeline(
    prompt,
    num_inference_steps=50,
    height=1024, width=1024,
    gligen_scheduled_sampling_beta=0.4,
    gligen_boxes=[[0.1, 0.6, 0.3, 0.8]],
    gligen_phrases=["a dog"],
    num_images_per_prompt=1,
    output_type="pt"
).images

📄 License

This project is licensed under the Apache-2.0 license.

📚 Documentation

Citation

The authors of this repo (IGLIGEN-XL) are not affiliated with the authors of GLIGEN and IGLIGEN. Since IGLIGEN-XL is based on GLIGEN and IGLIGEN, if you use the IGLIGEN-XL code or adapters, please kindly consider citing the original GLIGEN and IGLIGEN paper:

@article{li2023gligen,
  title={GLIGEN: Open-Set Grounded Text-to-Image Generation},
  author={Li, Yuheng and Liu, Haotian and Wu, Qingyang and Mu, Fangzhou and Yang, Jianwei and Gao, Jianfeng and Li, Chunyuan and Lee, Yong Jae},
  journal={CVPR},
  year={2023}
}
@article{lian2023llmgrounded,
  title={Llm-grounded diffusion: Enhancing prompt understanding of text-to-image diffusion models with large language models},
  author={Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor},
  journal={arXiv preprint arXiv:2305.13655},
  year={2023}
}

The project is part of the effort in creating InteractDiffusion XL.

Please kindly consider citing InteractDiffusion if you use IGLIGEN-XL code/trained weights.

@inproceedings{hoe2023interactdiffusion,
  title={InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models}, 
  author={Jiun Tian Hoe and Xudong Jiang and Chee Seng Chan and Yap-Peng Tan and Weipeng Hu},
  year={2024},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご