đ IGLIGEN-XL
This project aims to support a SDXL version of GLIGEN adapters with a Hugging Face-style pipeline, contributing to the creation of InteractDiffusion XL.
đĻ Information
Property |
Details |
Model Type |
SDXL version of GLIGEN adapters |
Training Data |
- jiuntian/sa1b-sdxl-latents-1024 - jiuntian/sa-1b_boxes_sdxl |
Base Model |
stabilityai/stable-diffusion-xl-base-1.0 |
Pipeline Tag |
text-to-image |
Library Name |
diffusers |
License |
Apache-2.0 |
đ Quick Start
This project aims to support a SDXL version of GLIGEN adapters, with a Hugging Face-style pipeline. The project is part of the effort in creating InteractDiffusion XL. More details can be found at the Github Repo.
⨠Features
IGLIGEN reproduces GLIGEN on diffusers frameworks and simplifies the training procedure. They have released the code and pretrained weights for SD v1.4/v1.5, SD v2.0/v2.1, but the support for SDXL is highly anticipated. This repo open-sources the pretrained weights of the GLIGEN adapter for SDXL, along with the diffusers pipeline and training code. We appreciate the work of the authors of GLIGEN and IGLIGEN.
đģ Usage Examples
Basic Usage
import torch
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"jiuntian/gligen-xl-1024", trust_remote_code=True, torch_dtype=torch.float16
).to("cuda")
prompt = "An image of grassland with a dog."
output_images = pipeline(
prompt,
num_inference_steps=50,
height=1024, width=1024,
gligen_scheduled_sampling_beta=0.4,
gligen_boxes=[[0.1, 0.6, 0.3, 0.8]],
gligen_phrases=["a dog"],
num_images_per_prompt=1,
output_type="pt"
).images
đ License
This project is licensed under the Apache-2.0 license.
đ Documentation
Citation
The authors of this repo (IGLIGEN-XL) are not affiliated with the authors of GLIGEN and IGLIGEN. Since IGLIGEN-XL is based on GLIGEN and IGLIGEN, if you use the IGLIGEN-XL code or adapters, please kindly consider citing the original GLIGEN and IGLIGEN paper:
@article{li2023gligen,
title={GLIGEN: Open-Set Grounded Text-to-Image Generation},
author={Li, Yuheng and Liu, Haotian and Wu, Qingyang and Mu, Fangzhou and Yang, Jianwei and Gao, Jianfeng and Li, Chunyuan and Lee, Yong Jae},
journal={CVPR},
year={2023}
}
@article{lian2023llmgrounded,
title={Llm-grounded diffusion: Enhancing prompt understanding of text-to-image diffusion models with large language models},
author={Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor},
journal={arXiv preprint arXiv:2305.13655},
year={2023}
}
The project is part of the effort in creating InteractDiffusion XL.
Please kindly consider citing InteractDiffusion if you use IGLIGEN-XL code/trained weights.
@inproceedings{hoe2023interactdiffusion,
title={InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models},
author={Jiun Tian Hoe and Xudong Jiang and Chee Seng Chan and Yap-Peng Tan and Weipeng Hu},
year={2024},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
}