SD3.5-Large-IP-Adapter Open-Source Image Generation Model - Create New Images with Image and Text Prompts

SD3.5 Large IP Adapter

Developed by InstantX

This is an IP adapter based on the SD3.5-Large model, capable of using images as input conditions alongside text prompts to generate new images.

Text-to-Image EnglishOpen Source License:Other #Image-guided Generation #Multimodal Fusion #High-resolution Image Generation

Downloads 1,474

Release Time : 11/7/2024

Model Overview

This model is an IP adapter that can use images as input conditions, similar to text, for text-to-image generation tasks.

Model Features

Image Condition Input

Can use reference images as input conditions to influence generation results alongside text prompts

Multi-block Adaptation

New layers are added to all 38 blocks, enhancing the model's ability to understand image conditions

High-performance Image Encoding

Uses google/siglip-so400m-patch14-384 for image encoding, combined with TimeResampler for projection

64 Image Tokens

Sets 64 image tokens to balance generation quality and computational efficiency

Model Capabilities

Text-to-Image Generation

Image-conditioned Generation

High-resolution Image Generation

Use Cases

Creative Design

Style Transfer

Generate new images based on the style of reference images

New content generated with consistent style to the reference image

Concept Design

Generate design concepts combining text descriptions and reference images

Creative designs that fuse text and image features

Content Creation

Illustration Generation

Generate complete illustrations based on sketches or reference images

High-quality illustrations that retain features of the reference image

🚀 SD3.5-Large-IP-Adapter

This repository provides an IP-Adapter for the SD3.5-Large model released by researchers from the InstantX Team. In this model, images function similarly to text, which might lead to non - responsiveness or interference with other text. Nevertheless, we sincerely hope you'll enjoy using this model, have fun, and share your creative works with us on Twitter.

🚀 Quick Start

This IP - Adapter is designed for the SD3.5 - Large model. Although the code hasn't been integrated into the diffusers library yet, you can use the provided local files for inference.

✨ Features

Regular IP - Adapter: New layers are added to all 38 blocks.
Superior Image Encoder: google/siglip-so400m-patch14-384 is used for image encoding due to its excellent performance.
TimeResampler Projection: A TimeResampler is adopted for projection.
Fixed Image Token Number: The image token number is set to 64.

📦 Installation

As the code has not been integrated into diffusers, you need to use the local files provided in this repository.

💻 Usage Examples

Basic Usage

import torch
from PIL import Image

from models.transformer_sd3 import SD3Transformer2DModel
from pipeline_stable_diffusion_3_ipa import StableDiffusion3Pipeline

model_path = 'stabilityai/stable-diffusion-3.5-large'
ip_adapter_path = './ip-adapter.bin'
image_encoder_path = "google/siglip-so400m-patch14-384"

transformer = SD3Transformer2DModel.from_pretrained(
    model_path, subfolder="transformer", torch_dtype=torch.bfloat16
)

pipe = StableDiffusion3Pipeline.from_pretrained(
    model_path, transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")

pipe.init_ipadapter(
    ip_adapter_path=ip_adapter_path, 
    image_encoder_path=image_encoder_path, 
    nb_token=64, 
)

ref_img = Image.open('./assets/1.jpg').convert('RGB')

# please note that SD3.5 Large is sensitive to highres generation like 1536x1536
image = pipe(
    width=1024,
    height=1024,
    prompt='a cat',
    negative_prompt="lowres, low quality, worst quality",
    num_inference_steps=24, 
    guidance_scale=5.0,
    generator=torch.Generator("cuda").manual_seed(42),
    clip_image=ref_img,
    ipadapter_scale=0.5,
).images[0]
image.save('./result.jpg')

📚 Documentation

Model Card

This is a regular IP - Adapter, where new layers are added to all 38 blocks. We use google/siglip-so400m-patch14-384 to encode images because of its superior performance and adopt a TimeResampler for projection. The number of image tokens is set to 64.

Showcases

Community ComfyUI Support

For ComfyUI support, please refer to Slickytail/ComfyUI-InstantX-IPAdapter-SD3.

📄 License

The model is released under stabilityai-ai-community. All copyrights are reserved.

🤝 Acknowledgements

This project is sponsored by HuggingFace and fal.ai. Special thanks to Slickytail for supporting the ComfyUI node.

📖 Citation

If you find this project useful in your research, please cite us via

@misc{sd35-large-ipa,
    author = {InstantX Team},
    title = {InstantX SD3.5-Large IP-Adapter Page},
    year = {2024},
}

📄 Information Table

Property	Details
Library Name	diffusers
Pipeline Tag	text-to-image
Tags	Text-to-Image, IP-Adapter, StableDiffusion3Pipeline, image-generation, Stable Diffusion
Base Model	stabilityai/stable-diffusion-3.5-large

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご