ControlNetMediaPipeFace开源模型 - 免费生成面部表情精准可控的图像

首页

Controlnetmediapipeface

由 stablediffusionapi 开发

基于LAION人脸数据集训练的ControlNet模型，用于生成带精确面部表情控制的图像

图像生成英语开源协议:Openrail #面部表情控制 #视线方向编辑 #多人脸生成

下载量 15

发布时间 : 6/20/2023

模型简介

该模型通过MediaPipe人脸检测器生成面部关键点标注，训练出能够精确控制生成图像中面部表情、视线方向的ControlNet。支持多人脸场景，可应用于人像编辑、广告设计等领域。

模型特点

精确面部控制

通过瞳孔关键点实现视线方向控制，支持眉毛、眼睛、嘴部等多部位表情调节

多人脸支持

可同时处理图像中的多个人脸，保持各自独立的表情控制

双版本兼容

支持Stable Diffusion 2.1-base和1.5两个版本，满足不同需求

模型能力

面部表情控制

视线方向调整

多人脸图像生成

图像到图像转换

使用案例

广告设计

定制化广告人物表情

根据广告需求生成特定表情的人物形象

样本展示中成功生成快乐、惊讶等多种表情的广告图像

人像编辑

视线方向修正

调整照片中人物的视线方向

模型可精确控制瞳孔位置改变视线方向

🚀 ControlNet LAION人脸数据集

本数据集旨在训练一个能够处理人类面部表情的ControlNet模型。它包含了瞳孔关键点信息，可用于控制视线方向。该数据集已在Stable Diffusion v2.1 base (512)和Stable Diffusion v1.5上完成训练测试。

🚀 快速开始

本数据集是对 https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace 的复刻。使用前，请将包含的ZIP文件解压到ControlNet目录的根目录下。train_laion_face.py、laion_face_dataset.py 等 .py 文件应与 tutorial_train.py 和 tutorial_train_sd21.py 放在同一目录。这里假设使用的是ControlNet仓库的 0acb7e5 版本，但实际上并不直接依赖该仓库。

✨ 主要特性

面部表情训练：专门用于训练处理人类面部表情的ControlNet模型。
视线方向控制：包含瞳孔关键点信息，可控制视线方向。
多模型支持：已在Stable Diffusion v2.1 base (512)和Stable Diffusion v1.5上完成训练测试。
多人脸处理：支持处理包含多个人脸的图像。

📦 安装指南

下载

由于版权原因，原始目标文件未包含在数据集中。可以使用脚本 tool_download_face_targets.py 从 training/laion-face-processed/metadata.json 读取信息并填充目标文件夹。该脚本没有额外依赖，但如果安装了 tqdm 会使用它来显示进度。

训练

当目标文件夹填充完成后，可以在至少拥有24GB显存的机器上进行训练。本模型在A6000上训练了200小时（四个周期）。

python tool_add_control.py ./models/v1-5-pruned-emaonly.ckpt ./models/controlnet_sd15_laion_face.ckpt
python ./train_laion_face_sd15.py

推理

提供了 gradio_face2image.py 用于推理。需要更新以下两行代码，指向训练好的模型：

model = create_model('./models/cldm_v21.yaml').cpu()  # 如果在SD2.1 base上微调，此行无需更改。
model.load_state_dict(load_state_dict('./models/control_sd21_openpose.pth', location='cuda'))

该模型存在一些局限性：虽然在跟踪视线和嘴巴姿势方面比之前的尝试有所改进，但仍可能忽略控制信息。在提示中添加细节，如 “looking right” 可以减少不良表现。

🧨 Diffusers使用方法

建议将该检查点与 Stable Diffusion 2.1 - Base 一起使用，因为该检查点是在其上训练的。实验表明，该检查点也可以与其他扩散模型（如经过Dreambooth训练的Stable Diffusion）一起使用。要与Stable Diffusion 1.5一起使用，请在 from_pretrained 参数中插入 subfolder="diffusion_sd15"。提供了一个v1.5半精度版本，但未进行测试。

安装 diffusers 及相关包：

$ pip install diffusers transformers accelerate

运行代码：

from PIL import Image
import numpy as np
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from diffusers.utils import load_image

image = load_image(
    "https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace/resolve/main/samples_laion_face_dataset/family_annotation.png"
)

# Stable Diffusion 2.1-base:
controlnet = ControlNetModel.from_pretrained("CrucibleAI/ControlNetMediaPipeFace", torch_dtype=torch.float16, variant="fp16")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1-base", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16
)
# OR
# Stable Diffusion 1.5:
controlnet = ControlNetModel.from_pretrained("CrucibleAI/ControlNetMediaPipeFace", subfolder="diffusion_sd15")
pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

# 如果未安装xformers，请删除以下行
# 安装说明请参考 https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers
pipe.enable_xformers_memory_efficient_attention()
pipe.enable_model_cpu_offload()

image = pipe("a happy family at a dentist advertisement", image=image, num_inference_steps=30).images[0]
image.save('./images.png')

💻 使用示例

基础用法

上述的推理代码和Diffusers使用代码都可以作为基础用法示例，展示了如何使用训练好的模型进行图像生成。

📚 详细文档

概述

示例

从ControlNet + Stable Diffusion v2.1 Base中精心挑选的示例：

输入	人脸检测	输出

也支持包含多个人脸的图像：

数据集内容

train_laion_face.py - ControlNet训练的入口点。
laion_face_dataset.py - 执行数据集迭代的代码，图像裁剪和调整大小操作在此处完成。
tool_download_face_targets.py - 用于读取 metadata.json 并填充目标文件夹的工具。
tool_generate_face_poses.py - 用于生成源图像的原始文件，包含该文件是为了可重复性，但训练时不是必需的。
training/laion-face-processed/prompt.jsonl - 由 laion_face_dataset 读取，包含图像的提示信息。
training/laion-face-processed/metadata.json - LAION相关数据的摘录，也用于下载目标数据集。
training/laion-face-processed/source/xxxxxxxxx.jpg - 经过检测的图像，从目标图像生成。
training/laion-face-processed/target/xxxxxxxxx.jpg - 从LAION Face中选择的图像。

数据集构建

源图像是通过从LAION Face中提取切片 00000 并将其通过MediaPipe的人脸检测器（使用特殊配置参数）生成的。 MediaPipe使用的颜色和线条粗细如下：

f_thick = 2
f_rad = 1
right_iris_draw = DrawingSpec(color=(10, 200, 250), thickness=f_thick, circle_radius=f_rad)
right_eye_draw = DrawingSpec(color=(10, 200, 180), thickness=f_thick, circle_radius=f_rad)
right_eyebrow_draw = DrawingSpec(color=(10, 220, 180), thickness=f_thick, circle_radius=f_rad)
left_iris_draw = DrawingSpec(color=(250, 200, 10), thickness=f_thick, circle_radius=f_rad)
left_eye_draw = DrawingSpec(color=(180, 200, 10), thickness=f_thick, circle_radius=f_rad)
left_eyebrow_draw = DrawingSpec(color=(180, 220, 10), thickness=f_thick, circle_radius=f_rad)
mouth_draw = DrawingSpec(color=(10, 180, 10), thickness=f_thick, circle_radius=f_rad)
head_draw = DrawingSpec(color=(10, 200, 10), thickness=f_thick, circle_radius=f_rad)

iris_landmark_spec = {468: right_iris_draw, 473: left_iris_draw}

实现了一个名为 draw_pupils 的方法，该方法修改了MediaPipe的一些功能，在一些待合并的更改完成之前，它作为一个临时解决方案存在。

🔧 技术细节

本数据集的构建涉及到从LAION Face中提取数据，并使用MediaPipe的人脸检测器进行处理。在处理过程中，对MediaPipe的功能进行了一些修改，实现了 draw_pupils 方法来处理瞳孔绘制。训练过程在Stable Diffusion v2.1 base (512)和Stable Diffusion v1.5上进行，需要至少24GB的显存。推理过程使用了 gradio_face2image.py 脚本，并提供了Diffusers的使用方法，方便与不同的扩散模型集成。

📄 许可证

源图像 (/training/laion-face-processed/source/)

本作品采用CC0 1.0许可。要查看此许可证的副本，请访问 http://creativecommons.org/publicdomain/zero/1.0。

训练模型

训练好的ControlNet检查点根据CreativeML Open RAIL-M许可发布。

源代码

lllyasviel/ControlNet根据Apache License 2.0许可。我们的修改也在相同的许可下发布。

致谢

非常感谢Zhang等人提出的ControlNet，Rombach等人（StabilityAI）提出的Stable Diffusion，以及Schuhmann等人提出的LAION。本文档的示例图像来自Unsplash，采用CC0许可。

@misc{zhang2023adding,
  title={Adding Conditional Control to Text-to-Image Diffusion Models}, 
  author={Lvmin Zhang and Maneesh Agrawala},
  year={2023},
  eprint={2302.05543},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{schuhmann2022laion5b,
      title={LAION-5B: An open large-scale dataset for training next generation image-text models}, 
      author={Christoph Schuhmann and Romain Beaumont and Richard Vencu and Cade Gordon and Ross Wightman and Mehdi Cherti and Theo Coombes and Aarush Katta and Clayton Mullis and Mitchell Wortsman and Patrick Schramowski and Srivatsa Kundurthy and Katherine Crowson and Ludwig Schmidt and Robert Kaczmarczyk and Jenia Jitsev},
      year={2022},
      eprint={2210.08402},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

本项目由Crucible AI实现。