ControlNetMediaPipeFace開源模型 - 免費生成面部表情精準可控的圖像

首頁

Controlnetmediapipeface

由stablediffusionapi開發

基於LAION人臉數據集訓練的ControlNet模型，用於生成帶精確面部表情控制的圖像

圖像生成英語開源協議:Openrail #面部表情控制 #視線方向編輯 #多人臉生成

下載量 15

發布時間 : 6/20/2023

模型概述

該模型通過MediaPipe人臉檢測器生成面部關鍵點標註，訓練出能夠精確控制生成圖像中面部表情、視線方向的ControlNet。支持多人臉場景，可應用於人像編輯、廣告設計等領域。

模型特點

精確面部控制

通過瞳孔關鍵點實現視線方向控制，支持眉毛、眼睛、嘴部等多部位表情調節

多人臉支持

可同時處理圖像中的多個人臉，保持各自獨立的表情控制

雙版本兼容

支持Stable Diffusion 2.1-base和1.5兩個版本，滿足不同需求

模型能力

面部表情控制

視線方向調整

多人臉圖像生成

圖像到圖像轉換

使用案例

廣告設計

定製化廣告人物表情

根據廣告需求生成特定表情的人物形象

樣本展示中成功生成快樂、驚訝等多種表情的廣告圖像

人像編輯

視線方向修正

調整照片中人物的視線方向

模型可精確控制瞳孔位置改變視線方向

🚀 ControlNet LAION人臉數據集

本數據集旨在訓練一個能夠處理人類面部表情的ControlNet模型。它包含了瞳孔關鍵點信息，可用於控制視線方向。該數據集已在Stable Diffusion v2.1 base (512)和Stable Diffusion v1.5上完成訓練測試。

🚀 快速開始

本數據集是對 https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace 的復刻。使用前，請將包含的ZIP文件解壓到ControlNet目錄的根目錄下。train_laion_face.py、laion_face_dataset.py 等 .py 文件應與 tutorial_train.py 和 tutorial_train_sd21.py 放在同一目錄。這裡假設使用的是ControlNet倉庫的 0acb7e5 版本，但實際上並不直接依賴該倉庫。

✨ 主要特性

面部表情訓練：專門用於訓練處理人類面部表情的ControlNet模型。
視線方向控制：包含瞳孔關鍵點信息，可控制視線方向。
多模型支持：已在Stable Diffusion v2.1 base (512)和Stable Diffusion v1.5上完成訓練測試。
多人臉處理：支持處理包含多個人臉的圖像。

📦 安裝指南

下載

由於版權原因，原始目標文件未包含在數據集中。可以使用腳本 tool_download_face_targets.py 從 training/laion-face-processed/metadata.json 讀取信息並填充目標文件夾。該腳本沒有額外依賴，但如果安裝了 tqdm 會使用它來顯示進度。

訓練

當目標文件夾填充完成後，可以在至少擁有24GB顯存的機器上進行訓練。本模型在A6000上訓練了200小時（四個週期）。

python tool_add_control.py ./models/v1-5-pruned-emaonly.ckpt ./models/controlnet_sd15_laion_face.ckpt
python ./train_laion_face_sd15.py

推理

提供了 gradio_face2image.py 用於推理。需要更新以下兩行代碼，指向訓練好的模型：

model = create_model('./models/cldm_v21.yaml').cpu()  # 如果在SD2.1 base上微調，此行無需更改。
model.load_state_dict(load_state_dict('./models/control_sd21_openpose.pth', location='cuda'))

該模型存在一些侷限性：雖然在跟蹤視線和嘴巴姿勢方面比之前的嘗試有所改進，但仍可能忽略控制信息。在提示中添加細節，如 “looking right” 可以減少不良表現。

🧨 Diffusers使用方法

建議將該檢查點與 Stable Diffusion 2.1 - Base 一起使用，因為該檢查點是在其上訓練的。實驗表明，該檢查點也可以與其他擴散模型（如經過Dreambooth訓練的Stable Diffusion）一起使用。要與Stable Diffusion 1.5一起使用，請在 from_pretrained 參數中插入 subfolder="diffusion_sd15"。提供了一個v1.5半精度版本，但未進行測試。

安裝 diffusers 及相關包：

$ pip install diffusers transformers accelerate

運行代碼：

from PIL import Image
import numpy as np
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from diffusers.utils import load_image

image = load_image(
    "https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace/resolve/main/samples_laion_face_dataset/family_annotation.png"
)

# Stable Diffusion 2.1-base:
controlnet = ControlNetModel.from_pretrained("CrucibleAI/ControlNetMediaPipeFace", torch_dtype=torch.float16, variant="fp16")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1-base", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16
)
# OR
# Stable Diffusion 1.5:
controlnet = ControlNetModel.from_pretrained("CrucibleAI/ControlNetMediaPipeFace", subfolder="diffusion_sd15")
pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

# 如果未安裝xformers，請刪除以下行
# 安裝說明請參考 https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers
pipe.enable_xformers_memory_efficient_attention()
pipe.enable_model_cpu_offload()

image = pipe("a happy family at a dentist advertisement", image=image, num_inference_steps=30).images[0]
image.save('./images.png')

💻 使用示例

基礎用法

上述的推理代碼和Diffusers使用代碼都可以作為基礎用法示例，展示瞭如何使用訓練好的模型進行圖像生成。

📚 詳細文檔

概述

示例

從ControlNet + Stable Diffusion v2.1 Base中精心挑選的示例：

輸入	人臉檢測	輸出

也支持包含多個人臉的圖像：

數據集內容

train_laion_face.py - ControlNet訓練的入口點。
laion_face_dataset.py - 執行數據集迭代的代碼，圖像裁剪和調整大小操作在此處完成。
tool_download_face_targets.py - 用於讀取 metadata.json 並填充目標文件夾的工具。
tool_generate_face_poses.py - 用於生成源圖像的原始文件，包含該文件是為了可重複性，但訓練時不是必需的。
training/laion-face-processed/prompt.jsonl - 由 laion_face_dataset 讀取，包含圖像的提示信息。
training/laion-face-processed/metadata.json - LAION相關數據的摘錄，也用於下載目標數據集。
training/laion-face-processed/source/xxxxxxxxx.jpg - 經過檢測的圖像，從目標圖像生成。
training/laion-face-processed/target/xxxxxxxxx.jpg - 從LAION Face中選擇的圖像。

數據集構建

源圖像是通過從LAION Face中提取切片 00000 並將其通過MediaPipe的人臉檢測器（使用特殊配置參數）生成的。 MediaPipe使用的顏色和線條粗細如下：

f_thick = 2
f_rad = 1
right_iris_draw = DrawingSpec(color=(10, 200, 250), thickness=f_thick, circle_radius=f_rad)
right_eye_draw = DrawingSpec(color=(10, 200, 180), thickness=f_thick, circle_radius=f_rad)
right_eyebrow_draw = DrawingSpec(color=(10, 220, 180), thickness=f_thick, circle_radius=f_rad)
left_iris_draw = DrawingSpec(color=(250, 200, 10), thickness=f_thick, circle_radius=f_rad)
left_eye_draw = DrawingSpec(color=(180, 200, 10), thickness=f_thick, circle_radius=f_rad)
left_eyebrow_draw = DrawingSpec(color=(180, 220, 10), thickness=f_thick, circle_radius=f_rad)
mouth_draw = DrawingSpec(color=(10, 180, 10), thickness=f_thick, circle_radius=f_rad)
head_draw = DrawingSpec(color=(10, 200, 10), thickness=f_thick, circle_radius=f_rad)

iris_landmark_spec = {468: right_iris_draw, 473: left_iris_draw}

實現了一個名為 draw_pupils 的方法，該方法修改了MediaPipe的一些功能，在一些待合併的更改完成之前，它作為一個臨時解決方案存在。

🔧 技術細節

本數據集的構建涉及到從LAION Face中提取數據，並使用MediaPipe的人臉檢測器進行處理。在處理過程中，對MediaPipe的功能進行了一些修改，實現了 draw_pupils 方法來處理瞳孔繪製。訓練過程在Stable Diffusion v2.1 base (512)和Stable Diffusion v1.5上進行，需要至少24GB的顯存。推理過程使用了 gradio_face2image.py 腳本，並提供了Diffusers的使用方法，方便與不同的擴散模型集成。

📄 許可證

源圖像 (/training/laion-face-processed/source/)

本作品採用CC0 1.0許可。要查看此許可證的副本，請訪問 http://creativecommons.org/publicdomain/zero/1.0。

訓練模型

訓練好的ControlNet檢查點根據CreativeML Open RAIL-M許可發佈。

源代碼

lllyasviel/ControlNet根據Apache License 2.0許可。我們的修改也在相同的許可下發布。

致謝

非常感謝Zhang等人提出的ControlNet，Rombach等人（StabilityAI）提出的Stable Diffusion，以及Schuhmann等人提出的LAION。本文檔的示例圖像來自Unsplash，採用CC0許可。

@misc{zhang2023adding,
  title={Adding Conditional Control to Text-to-Image Diffusion Models}, 
  author={Lvmin Zhang and Maneesh Agrawala},
  year={2023},
  eprint={2302.05543},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{schuhmann2022laion5b,
      title={LAION-5B: An open large-scale dataset for training next generation image-text models}, 
      author={Christoph Schuhmann and Romain Beaumont and Richard Vencu and Cade Gordon and Ross Wightman and Mehdi Cherti and Theo Coombes and Aarush Katta and Clayton Mullis and Mitchell Wortsman and Patrick Schramowski and Srivatsa Kundurthy and Katherine Crowson and Ludwig Schmidt and Robert Kaczmarczyk and Jenia Jitsev},
      year={2022},
      eprint={2210.08402},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

本項目由Crucible AI實現。