OmniGen-v1開源圖像生成模型 - 免費部署支持多種圖像生成任務

首頁

Omnigen V1

由Shitao開發

OmniGen是一個統一的圖像生成模型，支持多種圖像生成任務。

圖像生成開源協議:MIT #統一圖像生成 #多模態生成 #圖像到圖像轉換

下載量 5,886

發布時間 : 10/21/2024

模型概述

OmniGen是一個多功能的圖像生成模型，能夠執行文本到圖像和圖像到圖像的生成任務。

模型特點

多功能圖像生成

支持文本到圖像和圖像到圖像的生成任務。

統一模型架構

採用統一的模型架構處理多種圖像生成任務。

開源許可證

使用MIT許可證，允許自由使用和修改。

模型能力

文本到圖像生成

圖像到圖像轉換

使用案例

創意設計

藝術創作

根據文本描述生成藝術作品。

圖像風格轉換

將現有圖像轉換為不同風格。

內容生成

社交媒體內容

為社交媒體生成吸引人的視覺內容。

🚀 OmniGen：統一圖像生成

OmniGen 是一個統一的圖像生成模型，可根據多模態提示生成各種圖像。它簡單靈活、易於使用，為用戶探索圖像生成功能提供了便利。

🚀 快速開始

使用 OmniGen

通過 Github 進行安裝：

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

你也可以創建一個新的環境以避免衝突：

# 創建一個 Python 3.10.12 的 conda 環境（你也可以使用 virtualenv）
conda create -n omnigen python=3.10.12
conda activate omnigen

# 根據你的 CUDA 版本安裝 PyTorch，例如：
pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

以下是一些使用示例：

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")  
# 注意：你的本地模型路徑也是可以接受的，例如 'pipe = OmniGenPipeline.from_pretrained(your_local_model_path)'，其中 your_local_model_path 中的所有文件應按照 https://huggingface.co/Shitao/OmniGen-v1/tree/main 的結構組織

## 文本轉圖像
images = pipe(
    prompt="A curly-haired man in a red shirt is drinking tea.", 
    height=1024, 
    width=1024, 
    guidance_scale=2.5,
    seed=0,
)
images[0].save("example_t2i.png")  # 保存輸出的 PIL 圖像

## 多模態轉圖像
# 在提示中，我們使用佔位符來表示圖像。圖像佔位符的格式應為 <img><|image_*|></img>
# 你可以在 input_images 中添加多個圖像。請確保每個圖像都有其佔位符。例如，對於列表 input_images [img1_path, img2_path]，提示中需要有兩個佔位符：<img><|image_1|></img>，<img><|image_2|></img>。
images = pipe(
    prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
    input_images=["./imgs/test_cases/two_man.jpg"],
    height=1024, 
    width=1024,
    guidance_scale=2.5, 
    img_guidance_scale=1.6,
    seed=0
)
images[0].save("example_ti2i.png")  # 保存輸出的 PIL 圖像

如果出現內存不足的情況，你可以設置 offload_model=True。如果輸入多個圖像時推理時間過長，你可以減小 max_input_image_size。有關所需資源和高效運行 OmniGen 的方法，請參考 docs/inference.md#requiremented-resources。
有關更多圖像生成示例，你可以參考 inference.ipynb 和 inference_demo.ipynb。
有關推理中參數的更多詳細信息，請參考 docs/inference.md。

使用 Diffusers

即將推出。

Gradio 演示

我們在 Huggingface 上構建了一個在線演示。

對於本地 Gradio 演示，你需要安裝 pip install gradio spaces，然後運行：

pip install gradio spaces
python app.py

使用 Google Colab

要在 Google Colab 上使用，請使用以下命令：

!git clone https://github.com/staoxiao/OmniGen.git
%cd OmniGen
!pip install -e .
!pip install gradio spaces
!python app.py --share

✨ 主要特性

統一圖像生成：能夠根據多模態提示生成各種圖像，無需額外插件或操作。
簡單靈活：設計簡單，易於使用，用戶可以通過簡單的代碼實現圖像生成。
可微調：支持微調，用戶可以根據自己的需求對模型進行調整。

📦 安裝指南

通過 Github 進行安裝：

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

你也可以創建一個新的環境以避免衝突：

# 創建一個 Python 3.10.12 的 conda 環境（你也可以使用 virtualenv）
conda create -n omnigen python=3.10.12
conda activate omnigen

# 根據你的 CUDA 版本安裝 PyTorch，例如：
pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118

git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .

💻 使用示例

基礎用法

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")  

## 文本轉圖像
images = pipe(
    prompt="A curly-haired man in a red shirt is drinking tea.", 
    height=1024, 
    width=1024, 
    guidance_scale=2.5,
    seed=0,
)
images[0].save("example_t2i.png")  # 保存輸出的 PIL 圖像

高級用法

## 多模態轉圖像
# 在提示中，我們使用佔位符來表示圖像。圖像佔位符的格式應為 <img><|image_*|></img>
# 你可以在 input_images 中添加多個圖像。請確保每個圖像都有其佔位符。例如，對於列表 input_images [img1_path, img2_path]，提示中需要有兩個佔位符：<img><|image_1|></img>，<img><|image_2|></img>。
images = pipe(
    prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
    input_images=["./imgs/test_cases/two_man.jpg"],
    height=1024, 
    width=1024,
    guidance_scale=2.5, 
    img_guidance_scale=1.6,
    seed=0
)
images[0].save("example_ti2i.png")  # 保存輸出的 PIL 圖像

📚 詳細文檔

新聞

2024-10-28：我們發佈了新版本的推理代碼，優化了內存使用和時間成本。你可以參考 docs/inference.md 瞭解詳細信息。
2024-10-22：🔥 我們發佈了 OmniGen 的代碼。推理：docs/inference.md 訓練：docs/fine-tuning.md
2024-10-22：🔥 我們發佈了 OmniGen 的第一個版本。模型權重：Shitao/OmniGen-v1 HF 演示：🤗

方法

你可以在我們的論文中查看詳細信息。

OmniGen 的能力

OmniGen 是一個統一的圖像生成模型，可用於執行各種任務，包括但不限於文本到圖像生成、主題驅動生成、身份保留生成、圖像編輯和圖像條件生成。OmniGen 不需要額外的插件或操作，它可以根據文本提示自動識別輸入圖像中的特徵（例如所需對象、人體姿勢、深度映射）。 我們在 inference.ipynb 中展示了一些示例。在 inference_demo.ipynb 中，我們展示了一個有趣的圖像生成和修改流程。

你可以通過 OmniGen 靈活控制圖像生成 demo

如果你對某些功能不完全滿意或希望添加新功能，你可以嘗試微調 OmniGen。

微調

我們提供了一個訓練腳本 train.py 來微調 OmniGen。以下是一個關於 LoRA 微調的簡單示例：

accelerate launch --num_processes=1 train.py \
    --model_name_or_path Shitao/OmniGen-v1 \
    --batch_size_per_device 2 \
    --condition_dropout_prob 0.01 \
    --lr 1e-3 \
    --use_lora \
    --lora_rank 8 \
    --json_file ./toy_data/toy_subject_data.jsonl \
    --image_path ./toy_data/images \
    --max_input_length_limit 18000 \
    --keep_raw_resolution \
    --max_image_size 1024 \
    --gradient_accumulation_steps 1 \
    --ckpt_every 10 \
    --epochs 200 \
    --log_every 1 \
    --results_dir ./results/toy_finetune_lora

請參考 docs/fine-tuning.md 瞭解更多詳細信息（例如全量微調）。

貢獻者

感謝所有貢獻者的努力，熱烈歡迎新成員加入！

🔧 技術細節

你可以在我們的論文中查看詳細的技術細節。

📄 許可證

本倉庫採用 MIT 許可證。

引用

如果你覺得這個倉庫有用，請考慮給它一個星 ⭐ 並引用：

@article{xiao2024omnigen,
  title={Omnigen: Unified image generation},
  author={Xiao, Shitao and Wang, Yueze and Zhou, Junjie and Yuan, Huaying and Xing, Xingrun and Yan, Ruiran and Wang, Shuting and Huang, Tiejun and Liu, Zheng},
  journal={arXiv preprint arXiv:2409.11340},
  year={2024}
}