FLUX.1-dev-IP-Adapter Open-Source Model - Process Images Like Text to Support Text-to-Image Tasks

FLUX.1 Dev IP Adapter

Developed by InstantX

IP adapter for the FLUX.1-dev model, supporting image processing similar to text for text-to-image generation tasks

Text-to-Image EnglishOpen Source License:Other #Image-guided text generation #SigLIP visual encoding #High-resolution image generation

Downloads 8,361

Release Time : 8/14/2024

Model Overview

This is an IP adapter based on the FLUX.1-dev model, capable of taking images as input and generating corresponding text descriptions or performing image generation tasks.

Model Features

Combined image and text processing

Supports processing images similarly to text, enabling simultaneous handling of image and text inputs

High-performance image encoding

Utilizes google/siglip-so400m-patch14-384 for image encoding, delivering outstanding performance

Large-scale training

Trained on a 10-million open-source dataset with a batch size of 128 and 80,000 training steps

Model Capabilities

Text-to-image generation

Image encoding

Combined image and text processing

Use Cases

Creative design

Art creation

Generate artwork based on text descriptions

Produces high-quality artistic images

Character design

Generate specific character images combined with LoRA

Creates consistent character images

Content generation

Social media content

Quickly generate image content for social media

Diverse image content

🚀 FLUX.1-dev-IP-Adapter

This repository houses an IP-Adapter for the FLUX.1-dev model released by researchers from the InstantX Team. In this model, images function similarly to text, which might lead to non - responsiveness or interference with other text. Nevertheless, we sincerely hope you'll enjoy using this model, have a great time, and share your creative works with us on Twitter.

✨ Features

Model Structure: This is a regular IP-Adapter with new layers added into 38 single and 19 double blocks.
Image Encoding: We utilize google/siglip-so400m-patch14-384 for image encoding due to its superior performance.
Projection Model: A simple MLPProjModel with 2 linear layers is adopted for projection.
Training Details: The currently released model is trained on a 10M open - source dataset with a batch size of 128 and 80K training steps.

📚 Documentation

Model Card

This is a regular IP-Adapter, where the new layers are added into 38 single and 19 double blocks. We use google/siglip-so400m-patch14-384 to encode image for its superior performance, and adopt a simple MLPProjModel of 2 linear layers to project. The image token number is set to 128. The currently released model is trained on the 10M open source dataset with a batch size of 128 and 80K training steps.

Showcases

Showcase 1 Showcase 2 Showcase 3 Showcase 4 Showcase 5 Showcase 6 Showcase 7 Showcase 8 Showcase 9

Showcases (LoRA)

We adopt Shakker-Labs/FLUX.1-dev-LoRA-collections as a character LoRA and use its default prompt. Showcase LoRA

Limitations

This model supports image reference, but is not for fine - grained style transfer or character consistency, which means that there exists a trade - off between content leakage and style transfer. We don't find similar properties in FLUX.1-dev (DiT - based) as in InstantStyle (UNet - based). It may take several attempts to get satisfied results. Furthermore, the current released model may suffer from limited diversity, thus cannot cover some styles or concepts. Limitations Example

💻 Usage Examples

Basic Usage

import os
from PIL import Image

import torch
import torch.nn as nn

from pipeline_flux_ipa import FluxPipeline
from transformer_flux import FluxTransformer2DModel
from attention_processor import IPAFluxAttnProcessor2_0
from transformers import AutoProcessor, SiglipVisionModel

from infer_flux_ipa_siglip import resize_img, MLPProjModel, IPAdapter

image_encoder_path = "google/siglip-so400m-patch14-384"
ipadapter_path = "./ip-adapter.bin"
    
transformer = FluxTransformer2DModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev", subfolder="transformer", torch_dtype=torch.bfloat16
)

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
)

ip_model = IPAdapter(pipe, image_encoder_path, ipadapter_path, device="cuda", num_tokens=128)

image_dir = "./assets/images/2.jpg"
image_name = image_dir.split("/")[-1]
image = Image.open(image_dir).convert("RGB")
image = resize_img(image)

prompt = "a young girl"
    
images = ip_model.generate(
    pil_image=image, 
    prompt=prompt,
    scale=0.7,
    width=960, height=1280,
    seed=42
)

images[0].save(f"results/{image_name}")

📚 Additional Resources

ComfyUI

Please refer to ComfyUI-IPAdapter-Flux. ComfyUI Example

Online Inference

You can also enjoy this model at Shakker AI.

📄 License

The model is released under flux-1-dev-non-commercial-license. All copyright reserved.

👏 Acknowledgements

This project is sponsored by HuggingFace, fal.ai and Shakker Labs.

📖 Citation

If you find this project useful in your research, please cite us via

@misc{flux-ipa,
    author = {InstantX Team},
    title = {InstantX FLUX.1-dev IP-Adapter Page},
    year = {2024},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご