đ OmniGen: Unified Image Generation
OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It simplifies the image generation process, eliminating the need for additional plugins and operations, similar to how GPT works in language generation.
News |
Methodology |
Capabilities |
Quick Start |
Finetune |
License |
Citation
More information please refer to our github repo: https://github.com/VectorSpaceLab/OmniGen
đ Quick Start
Using OmniGen
Install via Github(Recommend):
git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .
or via pypi:
pip install OmniGen
Here are some examples:
from OmniGen import OmniGenPipeline
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
images = pipe(
prompt="A curly-haired man in a red shirt is drinking tea.",
height=1024,
width=1024,
guidance_scale=2.5,
seed=0,
)
images[0].save("example_t2i.png")
images = pipe(
prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
input_images=["./imgs/test_cases/two_man.jpg"],
height=1024,
width=1024,
separate_cfg_infer=False,
guidance_scale=3,
img_guidance_scale=1.6
)
images[0].save("example_ti2i.png")
For more details about the argument in inference, please refer to docs/inference.md.
For more examples for image generation, you can refer to inference.ipynb and inference_demo.ipynb
Using Diffusers
Coming soon.
Gradio Demo
We construct an online demo in Huggingface.
For the local gradio demo, you can run:
python app.py
⨠Features
OmniGen is a unified image generation model that can perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, image editing, and image-conditioned generation. It can automatically identify the features in input images according to the text prompt without the need for additional plugins or operations.
đĻ Installation
Install via Github(Recommend):
git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .
or via pypi:
pip install OmniGen
đģ Usage Examples
Basic Usage
from OmniGen import OmniGenPipeline
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
images = pipe(
prompt="A curly-haired man in a red shirt is drinking tea.",
height=1024,
width=1024,
guidance_scale=2.5,
seed=0,
)
images[0].save("example_t2i.png")
Advanced Usage
from OmniGen import OmniGenPipeline
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
images = pipe(
prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
input_images=["./imgs/test_cases/two_man.jpg"],
height=1024,
width=1024,
separate_cfg_infer=False,
guidance_scale=3,
img_guidance_scale=1.6
)
images[0].save("example_ti2i.png")
đ Documentation
You can see details in our paper.
đ§ Technical Details
Existing image generation models often require loading several additional network modules and performing extra preprocessing steps to generate a satisfactory image. However, OmniGen simplifies this process, aiming for a more simple and flexible image generation paradigm.
đ License
This repo is licensed under the MIT License.
Citation
If you find this repository useful, please consider giving a star â and citation
@article{xiao2024omnigen,
title={Omnigen: Unified image generation},
author={Xiao, Shitao and Wang, Yueze and Zhou, Junjie and Yuan, Huaying and Xing, Xingrun and Yan, Ruiran and Wang, Shuting and Huang, Tiejun and Liu, Zheng},
journal={arXiv preprint arXiv:2409.11340},
year={2024}
}
2. News
4. What Can OmniGen do?

OmniGen can be used to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, image editing, and image-conditioned generation. We showcase some examples in inference.ipynb. And in inference_demo.ipynb, we show an interesting pipeline to generate and modify an image.
If you are not entirely satisfied with certain functionalities or wish to add new capabilities, you can try fine-tuning OmniGen.
6. Finetune
We provide a training script train.py
to fine-tune OmniGen.
Here is a toy example about LoRA finetune:
accelerate launch --num_processes=1 train.py \
--model_name_or_path Shitao/OmniGen-v1 \
--batch_size_per_device 2 \
--condition_dropout_prob 0.01 \
--lr 1e-3 \
--use_lora \
--lora_rank 8 \
--json_file ./toy_data/toy_subject_data.jsonl \
--image_path ./toy_data/images \
--max_input_length_limit 18000 \
--keep_raw_resolution \
--max_image_size 1024 \
--gradient_accumulation_steps 1 \
--ckpt_every 10 \
--epochs 200 \
--log_every 1 \
--results_dir ./results/toy_finetune_lora
Please refer to docs/finetune.md for more details (e.g. full finetune).