OmniGen-v1 Open-source Image Generation Model - Supports Multimodal Prompts, Simple, Flexible and Easy to Use

Omnigen V1

Developed by BAAI

OmniGen is a unified image generation model supporting multimodal prompts, designed with simplicity, flexibility, and ease of use in mind.

Text-to-Image Open Source License:MIT #Multimodal Image Generation #Plugin-Free Control #Unified Architecture

Downloads 121

Release Time : 1/1/2025

Model Overview

OmniGen is a unified image generation model supporting multimodal prompts, capable of directly generating diverse images through multimodal instructions without plugins or complex operations. It supports tasks such as text-to-image generation, subject-driven generation, identity-preserving generation, and image editing.

Model Features

Multimodal Prompt Support

Supports image generation through various modal prompts such as text and images, without additional plugins or complex operations.

Flexible Control

Automatically identifies input image features (e.g., target objects, human poses, depth maps) for flexible control over image generation.

Ease of Use

Provides simple inference code and fine-tuning scripts, allowing users to easily extend functionalities.

Model Capabilities

Text-to-image

Subject-driven generation

Identity-preserving generation

Image editing

Use Cases

Creative Design

Character Image Generation

Generates character images based on text descriptions.

Produces high-quality character images, such as a curly-haired man in a red shirt drinking tea.

Image Editing

Edits image content based on input images and text prompts.

For example, replacing a character in an image with a specific appearance or modifying their actions.

Artistic Creation

Art Style Transfer

Transforms input images into specific artistic styles.

Generates images with artistic styles.

🚀 OmniGen: Unified Image Generation

OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It simplifies the image generation process and offers flexibility for various tasks.

More information please refer to our repo: https://github.com/VectorSpaceLab/OmniGen

News | Methodology | Capabilities | Quick Start | Finetune | License | Citation

✨ Features

OmniGen is a unified image generation model capable of performing various tasks, including text-to-image generation, subject-driven generation, Identity-Preserving Generation, image editing, and image-conditioned generation. It can automatically identify features in input images according to text prompts without the need for additional plugins or operations.

🚀 Quick Start

Using OmniGen

Install via Github:

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

You also can create a new environment to avoid conflicts:

# Create a python 3.10.12 conda env (you could also use virtualenv)
conda create -n omnigen python=3.10.12
conda activate omnigen

# Install pytorch with your CUDA version, e.g.
pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

Here are some examples:

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")  
# Note: Your local model path is also acceptable, such as 'pipe = OmniGenPipeline.from_pretrained(your_local_model_path)', where all files in your_local_model_path should be organized as https://huggingface.co/Shitao/OmniGen-v1/tree/main


## Text to Image
images = pipe(
    prompt="A curly-haired man in a red shirt is drinking tea.", 
    height=1024, 
    width=1024, 
    guidance_scale=2.5,
    seed=0,
)
images[0].save("example_t2i.png")  # save output PIL Image

## Multi-modal to Image
# In the prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
# You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
images = pipe(
    prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
    input_images=["./imgs/test_cases/two_man.jpg"],
    height=1024, 
    width=1024,
    guidance_scale=2.5, 
    img_guidance_scale=1.6,
    seed=0
)
images[0].save("example_ti2i.png")  # save output PIL image

If out of memory, you can set offload_model=True. If the inference time is too long when inputting multiple images, you can reduce the max_input_image_size. For the required resources and the method to run OmniGen efficiently, please refer to docs/inference.md#requiremented-resources.
For more examples of image generation, you can refer to inference.ipynb and inference_demo.ipynb
For more details about the argument in inference, please refer to docs/inference.md.

Using Diffusers

Coming soon.

Gradio Demo

We construct an online demo in Huggingface.

For the local gradio demo, you need to install pip install gradio spaces, and then you can run:

pip install gradio spaces
python app.py

Use Google Colab

To use with Google Colab, please use the following command:

!git clone https://github.com/staoxiao/OmniGen.git
%cd OmniGen
!pip install -e .
!pip install gradio spaces
!python app.py --share

🔧 Technical Details

You can see details in our paper.

📦 Finetune

We provide a training script train.py to fine-tune OmniGen. Here is a toy example about LoRA finetune:

accelerate launch --num_processes=1 train.py \
    --model_name_or_path Shitao/OmniGen-v1 \
    --batch_size_per_device 2 \
    --condition_dropout_prob 0.01 \
    --lr 1e-3 \
    --use_lora \
    --lora_rank 8 \
    --json_file ./toy_data/toy_subject_data.jsonl \
    --image_path ./toy_data/images \
    --max_input_length_limit 18000 \
    --keep_raw_resolution \
    --max_image_size 1024 \
    --gradient_accumulation_steps 1 \
    --ckpt_every 10 \
    --epochs 200 \
    --log_every 1 \
    --results_dir ./results/toy_finetune_lora

Please refer to docs/fine-tuning.md for more details (e.g. full finetune).

Contributors:

Thank all our contributors for their efforts and warmly welcome new members to join in!

📄 License

This repo is licensed under the MIT License.

Citation

If you find this repository useful, please consider giving a star ⭐ and citation

@article{xiao2024omnigen,
  title={Omnigen: Unified image generation},
  author={Xiao, Shitao and Wang, Yueze and Zhou, Junjie and Yuan, Huaying and Xing, Xingrun and Yan, Ruiran and Wang, Shuting and Huang, Tiejun and Liu, Zheng},
  journal={arXiv preprint arXiv:2409.11340},
  year={2024}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご