OmniGen-v1开源图像生成模型 - 免费部署支持多种图像生成任务

首页

Omnigen V1

由 Shitao 开发

OmniGen是一个统一的图像生成模型，支持多种图像生成任务。

图像生成开源协议:MIT #统一图像生成 #多模态生成 #图像到图像转换

下载量 5,886

发布时间 : 10/21/2024

模型简介

OmniGen是一个多功能的图像生成模型，能够执行文本到图像和图像到图像的生成任务。

模型特点

多功能图像生成

支持文本到图像和图像到图像的生成任务。

统一模型架构

采用统一的模型架构处理多种图像生成任务。

开源许可证

使用MIT许可证，允许自由使用和修改。

模型能力

文本到图像生成

图像到图像转换

使用案例

创意设计

艺术创作

根据文本描述生成艺术作品。

图像风格转换

将现有图像转换为不同风格。

内容生成

社交媒体内容

为社交媒体生成吸引人的视觉内容。

🚀 OmniGen：统一图像生成

OmniGen 是一个统一的图像生成模型，可根据多模态提示生成各种图像。它简单灵活、易于使用，为用户探索图像生成功能提供了便利。

🚀 快速开始

使用 OmniGen

通过 Github 进行安装：

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

你也可以创建一个新的环境以避免冲突：

# 创建一个 Python 3.10.12 的 conda 环境（你也可以使用 virtualenv）
conda create -n omnigen python=3.10.12
conda activate omnigen

# 根据你的 CUDA 版本安装 PyTorch，例如：
pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

以下是一些使用示例：

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")  
# 注意：你的本地模型路径也是可以接受的，例如 'pipe = OmniGenPipeline.from_pretrained(your_local_model_path)'，其中 your_local_model_path 中的所有文件应按照 https://huggingface.co/Shitao/OmniGen-v1/tree/main 的结构组织

## 文本转图像
images = pipe(
    prompt="A curly-haired man in a red shirt is drinking tea.", 
    height=1024, 
    width=1024, 
    guidance_scale=2.5,
    seed=0,
)
images[0].save("example_t2i.png")  # 保存输出的 PIL 图像

## 多模态转图像
# 在提示中，我们使用占位符来表示图像。图像占位符的格式应为 <img><|image_*|></img>
# 你可以在 input_images 中添加多个图像。请确保每个图像都有其占位符。例如，对于列表 input_images [img1_path, img2_path]，提示中需要有两个占位符：<img><|image_1|></img>，<img><|image_2|></img>。
images = pipe(
    prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
    input_images=["./imgs/test_cases/two_man.jpg"],
    height=1024, 
    width=1024,
    guidance_scale=2.5, 
    img_guidance_scale=1.6,
    seed=0
)
images[0].save("example_ti2i.png")  # 保存输出的 PIL 图像

如果出现内存不足的情况，你可以设置 offload_model=True。如果输入多个图像时推理时间过长，你可以减小 max_input_image_size。有关所需资源和高效运行 OmniGen 的方法，请参考 docs/inference.md#requiremented-resources。
有关更多图像生成示例，你可以参考 inference.ipynb 和 inference_demo.ipynb。
有关推理中参数的更多详细信息，请参考 docs/inference.md。

使用 Diffusers

即将推出。

Gradio 演示

我们在 Huggingface 上构建了一个在线演示。

对于本地 Gradio 演示，你需要安装 pip install gradio spaces，然后运行：

pip install gradio spaces
python app.py

使用 Google Colab

要在 Google Colab 上使用，请使用以下命令：

!git clone https://github.com/staoxiao/OmniGen.git
%cd OmniGen
!pip install -e .
!pip install gradio spaces
!python app.py --share

✨ 主要特性

统一图像生成：能够根据多模态提示生成各种图像，无需额外插件或操作。
简单灵活：设计简单，易于使用，用户可以通过简单的代码实现图像生成。
可微调：支持微调，用户可以根据自己的需求对模型进行调整。

📦 安装指南

通过 Github 进行安装：

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

你也可以创建一个新的环境以避免冲突：

# 创建一个 Python 3.10.12 的 conda 环境（你也可以使用 virtualenv）
conda create -n omnigen python=3.10.12
conda activate omnigen

# 根据你的 CUDA 版本安装 PyTorch，例如：
pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

💻 使用示例

基础用法

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")  

## 文本转图像
images = pipe(
    prompt="A curly-haired man in a red shirt is drinking tea.", 
    height=1024, 
    width=1024, 
    guidance_scale=2.5,
    seed=0,
)
images[0].save("example_t2i.png")  # 保存输出的 PIL 图像

高级用法

## 多模态转图像
# 在提示中，我们使用占位符来表示图像。图像占位符的格式应为 <img><|image_*|></img>
# 你可以在 input_images 中添加多个图像。请确保每个图像都有其占位符。例如，对于列表 input_images [img1_path, img2_path]，提示中需要有两个占位符：<img><|image_1|></img>，<img><|image_2|></img>。
images = pipe(
    prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
    input_images=["./imgs/test_cases/two_man.jpg"],
    height=1024, 
    width=1024,
    guidance_scale=2.5, 
    img_guidance_scale=1.6,
    seed=0
)
images[0].save("example_ti2i.png")  # 保存输出的 PIL 图像

📚 详细文档

新闻

2024-10-28：我们发布了新版本的推理代码，优化了内存使用和时间成本。你可以参考 docs/inference.md 了解详细信息。
2024-10-22：🔥 我们发布了 OmniGen 的代码。推理：docs/inference.md 训练：docs/fine-tuning.md
2024-10-22：🔥 我们发布了 OmniGen 的第一个版本。模型权重：Shitao/OmniGen-v1 HF 演示：🤗

方法

你可以在我们的论文中查看详细信息。

OmniGen 的能力

OmniGen 是一个统一的图像生成模型，可用于执行各种任务，包括但不限于文本到图像生成、主题驱动生成、身份保留生成、图像编辑和图像条件生成。OmniGen 不需要额外的插件或操作，它可以根据文本提示自动识别输入图像中的特征（例如所需对象、人体姿势、深度映射）。 我们在 inference.ipynb 中展示了一些示例。在 inference_demo.ipynb 中，我们展示了一个有趣的图像生成和修改流程。

你可以通过 OmniGen 灵活控制图像生成 demo

如果你对某些功能不完全满意或希望添加新功能，你可以尝试微调 OmniGen。

微调

我们提供了一个训练脚本 train.py 来微调 OmniGen。以下是一个关于 LoRA 微调的简单示例：

accelerate launch --num_processes=1 train.py \
    --model_name_or_path Shitao/OmniGen-v1 \
    --batch_size_per_device 2 \
    --condition_dropout_prob 0.01 \
    --lr 1e-3 \
    --use_lora \
    --lora_rank 8 \
    --json_file ./toy_data/toy_subject_data.jsonl \
    --image_path ./toy_data/images \
    --max_input_length_limit 18000 \
    --keep_raw_resolution \
    --max_image_size 1024 \
    --gradient_accumulation_steps 1 \
    --ckpt_every 10 \
    --epochs 200 \
    --log_every 1 \
    --results_dir ./results/toy_finetune_lora

请参考 docs/fine-tuning.md 了解更多详细信息（例如全量微调）。

贡献者

感谢所有贡献者的努力，热烈欢迎新成员加入！

🔧 技术细节

你可以在我们的论文中查看详细的技术细节。

📄 许可证

本仓库采用 MIT 许可证。

引用

如果你觉得这个仓库有用，请考虑给它一个星 ⭐ 并引用：

@article{xiao2024omnigen,
  title={Omnigen: Unified image generation},
  author={Xiao, Shitao and Wang, Yueze and Zhou, Junjie and Yuan, Huaying and Xing, Xingrun and Yan, Ruiran and Wang, Shuting and Huang, Tiejun and Liu, Zheng},
  journal={arXiv preprint arXiv:2409.11340},
  year={2024}
}