OmniGen-V1 Open-Source Multimodal Image Generation Model - Generate Various Images on Demand without Plugin Preprocessing

Omnigen V1

Developed by silveroxides

OmniGen is a unified multimodal image generation model capable of producing various types of images based on diverse instructions without requiring additional plugins or cumbersome preprocessing.

Text-to-Image Open Source License:MIT #Multimodal Instruction Generation #Zero-plugin Image Editing #Unified Generation Framework

Downloads 2,252

Release Time : 11/4/2024

Model Overview

OmniGen aims to create a simple and flexible image generation paradigm, capable of directly generating ideal images through multimodal instructions, much like how GPT processes text.

Model Features

Multimodal Instruction Generation

Automatically identifies input image features and combines them with text prompts to generate images without additional plugins or preprocessing.

Unified Generation Paradigm

A single-model solution supporting multiple image generation tasks (text-to-image, subject-driven generation, identity-preserving generation, etc.).

Flexible Expansion

Easily extends model capabilities through fine-tuning; simply prepare corresponding data to build any image generation task.

Model Capabilities

Text-to-Image

Subject-Driven Generation

Identity-Preserving Generation

Image Editing

Conditional Image Generation

Use Cases

Creative Design

Character Design

Generates character images of specific styles based on text descriptions.

Produces high-quality character images.

Scene Design

Generates images of specific scenes based on text descriptions.

Produces diverse scenes matching the descriptions.

Commercial Applications

Ad Material Generation

Quickly generates advertising images that align with product features.

Saves design time and costs.

Product Showcase

Generates display images based on product descriptions.

Produces attractive product showcase images.

🚀 OmniGen: Unified Image Generation

OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It simplifies the image generation process, eliminating the need for additional plugins and operations, similar to how GPT works in language generation.

News | Methodology | Capabilities | Quick Start | Finetune | License | Citation

More information please refer to our github repo: https://github.com/VectorSpaceLab/OmniGen

🚀 Quick Start

Using OmniGen

Install via Github(Recommend):

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

or via pypi:

pip install OmniGen

Here are some examples:

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")

# Text to Image
images = pipe(
    prompt="A curly-haired man in a red shirt is drinking tea.", 
    height=1024, 
    width=1024, 
    guidance_scale=2.5,
    seed=0,
)
images[0].save("example_t2i.png")  # save output PIL Image

# Multi-modal to Image
# In prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
# You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
images = pipe(
    prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
    input_images=["./imgs/test_cases/two_man.jpg"],
    height=1024, 
    width=1024,
    separate_cfg_infer=False,  # if OOM, you can set separate_cfg_infer=True 
    guidance_scale=3, 
    img_guidance_scale=1.6
)
images[0].save("example_ti2i.png")  # save output PIL image

For more details about the argument in inference, please refer to docs/inference.md. For more examples for image generation, you can refer to inference.ipynb and inference_demo.ipynb

Using Diffusers

Coming soon.

Gradio Demo

We construct an online demo in Huggingface. For the local gradio demo, you can run:

python app.py

✨ Features

OmniGen is a unified image generation model that can perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, image editing, and image-conditioned generation. It can automatically identify the features in input images according to the text prompt without the need for additional plugins or operations.

📦 Installation

Install via Github(Recommend):

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

or via pypi:

pip install OmniGen

💻 Usage Examples

Basic Usage

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")

# Text to Image
images = pipe(
    prompt="A curly-haired man in a red shirt is drinking tea.", 
    height=1024, 
    width=1024, 
    guidance_scale=2.5,
    seed=0,
)
images[0].save("example_t2i.png")  # save output PIL Image

Advanced Usage

# Multi-modal to Image
# In prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
# You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")

images = pipe(
    prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
    input_images=["./imgs/test_cases/two_man.jpg"],
    height=1024, 
    width=1024,
    separate_cfg_infer=False,  # if OOM, you can set separate_cfg_infer=True 
    guidance_scale=3, 
    img_guidance_scale=1.6
)
images[0].save("example_ti2i.png")  # save output PIL image

📚 Documentation

You can see details in our paper.

🔧 Technical Details

Existing image generation models often require loading several additional network modules and performing extra preprocessing steps to generate a satisfactory image. However, OmniGen simplifies this process, aiming for a more simple and flexible image generation paradigm.

📄 License

This repo is licensed under the MIT License.

Citation

If you find this repository useful, please consider giving a star ⭐ and citation

@article{xiao2024omnigen,
  title={Omnigen: Unified image generation},
  author={Xiao, Shitao and Wang, Yueze and Zhou, Junjie and Yuan, Huaying and Xing, Xingrun and Yan, Ruiran and Wang, Shuting and Huang, Tiejun and Liu, Zheng},
  journal={arXiv preprint arXiv:2409.11340},
  year={2024}
}

2. News

2024-10-22: :fire: We release the code for OmniGen. Inference: docs/inference.md Train: docs/fine-tuning.md
2024-10-22: :fire: We release the first version of OmniGen. Model Weight: Shitao/OmniGen-v1 HF Demo: 🤗

4. What Can OmniGen do?

demo

OmniGen can be used to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, image editing, and image-conditioned generation. We showcase some examples in inference.ipynb. And in inference_demo.ipynb, we show an interesting pipeline to generate and modify an image.

If you are not entirely satisfied with certain functionalities or wish to add new capabilities, you can try fine-tuning OmniGen.

6. Finetune

We provide a training script train.py to fine-tune OmniGen. Here is a toy example about LoRA finetune:

accelerate launch --num_processes=1 train.py \
    --model_name_or_path Shitao/OmniGen-v1 \
    --batch_size_per_device 2 \
    --condition_dropout_prob 0.01 \
    --lr 1e-3 \
    --use_lora \
    --lora_rank 8 \
    --json_file ./toy_data/toy_subject_data.jsonl \
    --image_path ./toy_data/images \
    --max_input_length_limit 18000 \
    --keep_raw_resolution \
    --max_image_size 1024 \
    --gradient_accumulation_steps 1 \
    --ckpt_every 10 \
    --epochs 200 \
    --log_every 1 \
    --results_dir ./results/toy_finetune_lora

Please refer to docs/finetune.md for more details (e.g. full finetune).

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご