đ SD3.5-Large-IP-Adapter
This repository provides an IP-Adapter for the SD3.5-Large model released by researchers from the InstantX Team. In this model, images function similarly to text, which might lead to non - responsiveness or interference with other text. Nevertheless, we sincerely hope you'll enjoy using this model, have fun, and share your creative works with us on Twitter.
đ Quick Start
This IP - Adapter is designed for the SD3.5 - Large model. Although the code hasn't been integrated into the diffusers
library yet, you can use the provided local files for inference.
⨠Features
- Regular IP - Adapter: New layers are added to all 38 blocks.
- Superior Image Encoder: google/siglip-so400m-patch14-384 is used for image encoding due to its excellent performance.
- TimeResampler Projection: A TimeResampler is adopted for projection.
- Fixed Image Token Number: The image token number is set to 64.
đĻ Installation
As the code has not been integrated into diffusers
, you need to use the local files provided in this repository.
đģ Usage Examples
Basic Usage
import torch
from PIL import Image
from models.transformer_sd3 import SD3Transformer2DModel
from pipeline_stable_diffusion_3_ipa import StableDiffusion3Pipeline
model_path = 'stabilityai/stable-diffusion-3.5-large'
ip_adapter_path = './ip-adapter.bin'
image_encoder_path = "google/siglip-so400m-patch14-384"
transformer = SD3Transformer2DModel.from_pretrained(
model_path, subfolder="transformer", torch_dtype=torch.bfloat16
)
pipe = StableDiffusion3Pipeline.from_pretrained(
model_path, transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")
pipe.init_ipadapter(
ip_adapter_path=ip_adapter_path,
image_encoder_path=image_encoder_path,
nb_token=64,
)
ref_img = Image.open('./assets/1.jpg').convert('RGB')
image = pipe(
width=1024,
height=1024,
prompt='a cat',
negative_prompt="lowres, low quality, worst quality",
num_inference_steps=24,
guidance_scale=5.0,
generator=torch.Generator("cuda").manual_seed(42),
clip_image=ref_img,
ipadapter_scale=0.5,
).images[0]
image.save('./result.jpg')
đ Documentation
Model Card
This is a regular IP - Adapter, where new layers are added to all 38 blocks. We use google/siglip-so400m-patch14-384 to encode images because of its superior performance and adopt a TimeResampler for projection. The number of image tokens is set to 64.
Showcases
Community ComfyUI Support
For ComfyUI support, please refer to Slickytail/ComfyUI-InstantX-IPAdapter-SD3.
đ License
The model is released under stabilityai-ai-community. All copyrights are reserved.
đ¤ Acknowledgements
This project is sponsored by HuggingFace and fal.ai. Special thanks to Slickytail for supporting the ComfyUI node.
đ Citation
If you find this project useful in your research, please cite us via
@misc{sd35-large-ipa,
author = {InstantX Team},
title = {InstantX SD3.5-Large IP-Adapter Page},
year = {2024},
}
đ Information Table
Property |
Details |
Library Name |
diffusers |
Pipeline Tag |
text-to-image |
Tags |
Text-to-Image, IP-Adapter, StableDiffusion3Pipeline, image-generation, Stable Diffusion |
Base Model |
stabilityai/stable-diffusion-3.5-large |