OmniGen-V1-fp8_e4m3fn Open-Source Multi-Modal Image Generation Model - Generate Images According to Instructions without Plugin Preprocessing

Omnigen V1 Fp8 E4m3fn

Developed by silveroxides

OmniGen is a unified multimodal image generation model capable of producing various types of images based on diverse instructions, without requiring additional plugins or preprocessing steps.

Text-to-Image Open Source License:MIT #Multimodal Image Generation #Unified Generation Framework #Instruction-Driven Generation

Downloads 64

Release Time : 11/15/2024

Model Overview

OmniGen aims to create a simple, flexible, and out-of-the-box generation tool, supporting tasks such as text-to-image, subject-driven generation, identity-preserving generation, image editing, and more.

Model Features

Unified Generation Framework

Capable of performing diverse image generation tasks without loading multiple additional modules (e.g., ControlNet, IP-Adapter, etc.)

Multimodal Instruction Understanding

Automatically identifies input image features (e.g., target objects, human poses, depth maps, etc.)

Simple and Easy to Use

Requires only arbitrary multimodal instructions to directly generate target images, with no need for extra plugins or operations

Flexible Expansion

Provides fine-tuning scripts to easily extend the model's capabilities

Model Capabilities

Text-to-Image

Subject-Driven Generation

Identity-Preserving Generation

Image Editing

Conditional Image Generation

Use Cases

Creative Design

Fashion Design

Generates character images wearing specific clothing based on text descriptions

Produces high-quality fashion design images that match the descriptions

Content Creation

Social Media Content Generation

Generates stylistically consistent series of content based on existing images

Diverse content while maintaining subject identity consistency

🚀 OmniGen: Unified Image Generation

OmniGen is a unified image generation model that can generate diverse images from multi - modal prompts. It simplifies the image - generation process, offering flexibility and ease of use.

News | Methodology | Capabilities | Quick Start | Finetune | License | Citation

More information please refer to our github repo: https://github.com/VectorSpaceLab/OmniGen

🚀 Quick Start

Using OmniGen

Install via Github(Recommend):

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

or via pypi:

pip install OmniGen

Here are some examples:

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")

# Text to Image
images = pipe(
    prompt="A curly-haired man in a red shirt is drinking tea.", 
    height=1024, 
    width=1024, 
    guidance_scale=2.5,
    seed=0,
)
images[0].save("example_t2i.png")  # save output PIL Image

# Multi-modal to Image
# In prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
# You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
images = pipe(
    prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
    input_images=["./imgs/test_cases/two_man.jpg"],
    height=1024, 
    width=1024,
    separate_cfg_infer=False,  # if OOM, you can set separate_cfg_infer=True 
    guidance_scale=3, 
    img_guidance_scale=1.6
)
images[0].save("example_ti2i.png")  # save output PIL image

For more details about the argument in inference, please refer to docs/inference.md. For more examples for image generation, you can refer to inference.ipynb and inference_demo.ipynb

Using Diffusers

Coming soon.

Gradio Demo

We construct an online demo in Huggingface.

For the local gradio demo, you can run:

python app.py

✨ Features

Unified Generation: OmniGen can generate a wide range of images from multi - modal prompts, eliminating the need for multiple additional network modules and extra pre - processing steps.
Ease of Use: It is designed to be simple and flexible. We provide inference code for users to explore more functionalities.
Fine - Tuning: Users can easily fine - tune OmniGen for specific tasks without designing new networks.

📦 Installation

Install via Github(Recommend):

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

or via pypi:

pip install OmniGen

💻 Usage Examples

Basic Usage

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")

# Text to Image
images = pipe(
    prompt="A curly-haired man in a red shirt is drinking tea.", 
    height=1024, 
    width=1024, 
    guidance_scale=2.5,
    seed=0,
)
images[0].save("example_t2i.png")  # save output PIL Image

Advanced Usage

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")

# Multi-modal to Image
# In prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
# You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
images = pipe(
    prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
    input_images=["./imgs/test_cases/two_man.jpg"],
    height=1024, 
    width=1024,
    separate_cfg_infer=False,  # if OOM, you can set separate_cfg_infer=True 
    guidance_scale=3, 
    img_guidance_scale=1.6
)
images[0].save("example_ti2i.png")  # save output PIL image

📚 Documentation

1. Overview

OmniGen is a unified image generation model that can generate a wide range of images from multi - modal prompts. It is designed to be simple, flexible and easy to use. We provide inference code so that everyone can explore more functionalities of OmniGen.

Existing image generation models often require loading several additional network modules (such as ControlNet, IP - Adapter, Reference - Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, we believe that the future image generation paradigm should be more simple and flexible, that is, generating various images directly through arbitrarily multi - modal instructions without the need for additional plugins and operations, similar to how GPT works in language generation.

Due to the limited resources, OmniGen still has room for improvement. We will continue to optimize it, and hope it inspire more universal image generation models. You can also easily fine - tune OmniGen without worrying about designing networks for specific tasks; you just need to prepare the corresponding data, and then run the script. Imagination is no longer limited; everyone can construct any image generation task, and perhaps we can achieve very interesting, wonderful and creative things.

If you have any questions, ideas or interesting tasks you want OmniGen to accomplish, feel free to discuss with us: 2906698981@qq.com, wangyueze@tju.edu.cn, zhengliu1026@gmail.com. We welcome any feedback to help us improve the model.

2. News

2024 - 10 - 22: :fire: We release the code for OmniGen. Inference: docs/inference.md Train: [docs/fine - tuning.md](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/fine - tuning.md)
2024 - 10 - 22: :fire: We release the first version of OmniGen. Model Weight: [Shitao/OmniGen - v1](https://huggingface.co/Shitao/OmniGen - v1) HF Demo: 🤗

3. Methodology

You can see details in our paper.

4. What Can OmniGen do?

demo

OmniGen is a unified image generation model that you can use to perform various tasks, including but not limited to text - to - image generation, subject - driven generation, Identity - Preserving Generation, image editing, and image - conditioned generation. OmniGen don't need additional plugins or operations, it can automatically identify the features (e.g., required object, human pose, depth mapping) in input images according the text prompt. We showcase some examples in inference.ipynb. And in inference_demo.ipynb, we show a insteresting pipeline to generate and modify a image.

If you are not entirely satisfied with certain functionalities or wish to add new capabilities, you can try fine - tuning OmniGen.

6. Finetune

We provide a training script train.py to fine - tune OmniGen. Here is a toy example about LoRA finetune:

accelerate launch --num_processes=1 train.py \
    --model_name_or_path Shitao/OmniGen-v1 \
    --batch_size_per_device 2 \
    --condition_dropout_prob 0.01 \
    --lr 1e-3 \
    --use_lora \
    --lora_rank 8 \
    --json_file ./toy_data/toy_subject_data.jsonl \
    --image_path ./toy_data/images \
    --max_input_length_limit 18000 \
    --keep_raw_resolution \
    --max_image_size 1024 \
    --gradient_accumulation_steps 1 \
    --ckpt_every 10 \
    --epochs 200 \
    --log_every 1 \
    --results_dir ./results/toy_finetune_lora

Please refer to [docs/finetune.md](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/fine - tune.md) for more details (e.g. full finetune).

🔧 Technical Details

The paper Omnigen: Unified image generation provides detailed technical information about OmniGen.

📄 License

This repo is licensed under the MIT License.

📚 Citation

If you find this repository useful, please consider giving a star ⭐ and citation

@article{xiao2024omnigen,
  title={Omnigen: Unified image generation},
  author={Xiao, Shitao and Wang, Yueze and Zhou, Junjie and Yuan, Huaying and Xing, Xingrun and Yan, Ruiran and Wang, Shuting and Huang, Tiejun and Liu, Zheng},
  journal={arXiv preprint arXiv:2409.11340},
  year={2024}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご