stable-diffusion-3.5-large-controlnet-depth开源图像生成模型

首页

Stable Diffusion 3.5 Large Controlnet Depth

由 stabilityai 开发

基于Stable Diffusion 3.5 Large的深度ControlNet模型，用于通过深度图控制图像生成

图像生成英语开源协议:其他 #深度控制生成 #高精度图像合成 #多模态预处理

下载量 803

发布时间 : 11/25/2024

模型简介

该模型是Stable Diffusion 3.5 Large的深度ControlNet版本，允许用户通过深度图精确控制生成图像的构图和空间关系。

模型特点

深度控制

通过深度图精确控制生成图像的构图和空间关系

商业友好许可

年营收低于100万美元的组织可免费商用

高质量生成

基于Stable Diffusion 3.5 Large的强大生成能力

灵活集成

支持通过diffusers库或独立仓库使用

模型能力

基于文本提示生成图像

基于深度图控制图像生成

高分辨率图像生成

创意内容创作

使用案例

创意设计

概念艺术创作

艺术家可以使用深度图控制生成概念艺术作品

精确控制构图和透视的概念艺术作品

产品设计

产品可视化

设计师可以基于产品深度图生成不同风格的产品渲染图

多种风格的产品可视化效果

🚀 Stable Diffusion 3.5 Large Controlnet - Depth

本项目提供了用于图像生成的深度控制网络（Depth ControlNet），结合Stable Diffusion 3.5 Large模型，能够根据输入的深度信息生成高质量的图像，广泛应用于创意设计、研究等领域。

🚀 快速开始

本仓库提供了适用于 Stable Diffusion 3.5 Large 的深度控制网络（Depth ControlNet）。

请注意：此模型遵循 Stability Community License 发布。访问 Stability AI 了解更多信息，或联系我们获取商业许可详情。

📄 许可证

以下是许可证的关键内容：

非商业使用免费：个人和组织可免费将该模型用于非商业用途，包括科学研究。
年收入低于100万美元的商业使用免费：初创公司、中小型企业和创作者在其年收入低于100万美元的情况下，可免费将该模型用于商业目的。
输出内容所有权：保留所生成媒体的所有权，且无限制性许可影响。

对于年收入超过100万美元的组织，请点击此处咨询企业许可证。

📦 安装指南

在SD3.5独立仓库中使用Controlnets

克隆仓库并安装依赖：

git clone git@github.com:Stability-AI/sd3.5.git
pip install -r requirements.txt

然后，下载模型和示例图像：

input/sample_cond.png
models/clip_g.safetensors
models/clip_l.safetensors
models/t5xxl.safetensors
models/sd3.5_large.safetensors
models/canny_8b.safetensors

之后，你可以运行以下命令：

python sd3_infer.py --controlnet_ckpt models/depth_8b.safetensors --controlnet_cond_image input/sample_cond.png --prompt "A girl sitting in a cafe, cozy interior, HDR photograph"

运行上述命令后，你将得到类似下面的图像：

A girl sitting in a cafe

在Diffusers中使用Controlnets

确保你已将Diffusers升级到最新版本：pip install -U diffusers。然后，你可以运行以下代码：

import torch
from diffusers import StableDiffusion3ControlNetPipeline, SD3ControlNetModel
from diffusers.utils import load_image

controlnet = SD3ControlNetModel.from_pretrained("stabilityai/stable-diffusion-3.5-large-controlnet-depth", torch_dtype=torch.float16)
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    controlnet=controlnet,
    torch_dtype=torch.float16,
).to("cuda")

control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_einstein_lcm_depth.png")
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(
    prompt = "a photo of a man", 
    control_image=control_image, 
    guidance_scale=4.5,
    num_inference_steps=40,
    generator=generator,
    max_sequence_length=77,
).images[0]
image.save('depth-8b.jpg')

你可以使用 image_gen_aux 提取 depth_image，其中包含了在Diffusers管道中使用所需的所有预处理器。

# install image_gen_aux with: pip install git+https://github.com/huggingface/image_gen_aux.git
from image_gen_aux import DepthPreprocessor
image = load_image("path to image")

depth_preprocessor = DepthPreprocessor.from_pretrained("depth-anything/Depth-Anything-V2-Large-hf").to("cuda")
depth_image = depth_preprocessor(image, invert=True)[0].convert("RGB")

预处理

可以按照以下代码片段对输入图像进行预处理以用于控制。SD3.5未实现此行为，因此我们建议事先在外部脚本中进行处理。

# install depthfm from https://github.com/CompVis/depth-fm
import torchvision.transforms as transforms
from depthfm.dfm import DepthFM
depthfm_model = DepthFM(ckpt_path=checkpoint_path)
depthfm_model.eval()

# assuming img is a PIL image
img = F.to_tensor(img)
c, h, w = img.shape
img = F.interpolate(img, (512, 512), mode='bilinear', align_corners=False)
with torch.no_grad():
  img = self.depthfm_model(img, num_steps=2, ensemble_size=4)
img = F.interpolate(img, (h, w), mode='bilinear', align_corners=False)

💻 使用示例

基础用法

在SD3.5独立仓库中使用Controlnets的示例：

# 克隆仓库并安装依赖
git clone git@github.com:Stability-AI/sd3.5.git
pip install -r requirements.txt

# 下载模型和示例图像
# input/sample_cond.png
# models/clip_g.safetensors
# models/clip_l.safetensors
# models/t5xxl.safetensors
# models/sd3.5_large.safetensors
# models/canny_8b.safetensors

# 运行推理
python sd3_infer.py --controlnet_ckpt models/depth_8b.safetensors --controlnet_cond_image input/sample_cond.png --prompt "A girl sitting in a cafe, cozy interior, HDR photograph"

高级用法

在Diffusers中使用Controlnets的示例：

import torch
from diffusers import StableDiffusion3ControlNetPipeline, SD3ControlNetModel
from diffusers.utils import load_image

# 加载ControlNet模型
controlnet = SD3ControlNetModel.from_pretrained("stabilityai/stable-diffusion-3.5-large-controlnet-depth", torch_dtype=torch.float16)
# 加载Stable Diffusion 3.5 Large模型
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    controlnet=controlnet,
    torch_dtype=torch.float16,
).to("cuda")

# 加载控制图像
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_einstein_lcm_depth.png")
# 设置随机种子
generator = torch.Generator(device="cpu").manual_seed(0)
# 生成图像
image = pipe(
    prompt = "a photo of a man", 
    control_image=control_image, 
    guidance_scale=4.5,
    num_inference_steps=40,
    generator=generator,
    max_sequence_length=77,
).images[0]
# 保存图像
image.save('depth-8b.jpg')

📚 详细文档

使用提示

建议初始ControlNet强度设置为0.7，然后根据需要进行调整。
使用Euler采样器和稍高的步数（50 - 60）可获得最佳效果。
传递 --text_encoder_device <device_name> 可将文本编码器直接加载到VRAM中，这可以加快整个推理循环，但会增加VRAM的使用量。

使用范围

模型的所有使用必须符合我们的可接受使用政策。

超出范围的使用

该模型并非用于生成对人物或事件的事实性或真实表述。因此，使用该模型生成此类内容超出了该模型的能力范围。

训练数据和策略

这些模型在多种数据上进行了训练，包括合成数据和经过筛选的公开可用数据。

🔧 技术细节

安全性

我们坚信安全、负责任的人工智能实践，并采取了深思熟虑的措施，以确保在开发的早期阶段就保证模型的完整性。这意味着我们已经并将继续采取合理的步骤，以防止不良行为者滥用Stable Diffusion 3.5。有关我们的安全方法的更多信息，请访问我们的安全页面。

完整性评估

我们的完整性评估方法包括结构化评估和针对某些危害的红队测试。测试主要以英语进行，可能无法涵盖所有可能的危害。

已识别的风险和缓解措施：

有害内容：我们在训练模型时使用了经过筛选的数据集，并实施了保障措施，试图在实用性和防止危害之间取得适当的平衡。然而，这并不能保证所有可能的有害内容都已被移除。所有开发者和部署者都应谨慎行事，并根据其特定的产品政策和应用用例实施内容安全防护措施。
滥用：技术限制以及开发者和最终用户的教育可以帮助减轻模型的恶意应用。所有用户都必须遵守我们的可接受使用政策，包括在应用微调和平提示工程机制时。有关我们产品违规使用的信息，请参考Stability AI可接受使用政策。
隐私侵犯：鼓励开发者和部署者采用尊重数据隐私的技术，遵守隐私法规。