ddpm-anime-faces-64開源模型 - 免費生成64x64像素動漫風格人臉圖像

首頁

Ddpm Anime Faces 64

由LittleNyima開發

在huggan/anime-faces數據集上訓練的DDPM模型，用於生成64x64像素的動漫風格人臉圖像。

圖像生成其他開源協議:MIT #動漫人臉生成 #無條件圖像生成 #擴散模型實現

下載量 20

發布時間 : 6/11/2024

模型概述

該模型使用去噪擴散概率模型（DDPM）架構，專門用於無條件生成動漫風格的人臉圖像。

模型特點

從頭實現DDPM

不使用現成的DDPMScheduler，而是自行實現調度器，更適合學習理解DDPM原理

輕量級模型

圖像尺寸為64x64，適合快速推理和實驗

動漫風格生成

專門針對動漫人臉風格進行優化訓練

模型能力

動漫人臉圖像生成

無條件圖像合成

使用案例

創意設計

動漫角色設計

快速生成多樣化的動漫角色面部原型

可批量生成32張64x64像素的動漫人臉

教育研究

擴散模型教學

作為理解DDPM工作原理的教學示例

提供完整的訓練和推理代碼實現

🚀 DDPM動漫人臉生成模型

本項目基於DDPM（Denoising Diffusion Probabilistic Models）模型，在 huggan/anime-faces 數據集上進行訓練，可實現無條件圖像生成，為動漫人臉的生成提供了有效的解決方案。

🚀 快速開始

本項目的推理代碼可直接運行，實現動漫人臉圖像的生成。以下是推理代碼示例：

import torch
from tqdm import tqdm
from diffusers import UNet2DModel

class DDPM:
    def __init__(
        self,
        num_train_timesteps:int = 1000,
        beta_start: float = 0.0001,
        beta_end: float = 0.02,
    ):
        self.num_train_timesteps = num_train_timesteps
        self.betas = torch.linspace(beta_start, beta_end, num_train_timesteps, dtype=torch.float32)
        self.alphas = 1.0 - self.betas
        self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)
        self.timesteps = torch.arange(num_train_timesteps - 1, -1, -1)
    
    def add_noise(
        self,
        original_samples: torch.Tensor,
        noise: torch.Tensor,
        timesteps: torch.Tensor,
    ):
        alphas_cumprod = self.alphas_cumprod.to(device=original_samples.device ,dtype=original_samples.dtype)
        noise = noise.to(original_samples.device)
        timesteps = timesteps.to(original_samples.device)

        # \sqrt{\bar\alpha_t}
        sqrt_alpha_prod = alphas_cumprod[timesteps].flatten() ** 0.5
        while len(sqrt_alpha_prod.shape) < len(original_samples.shape):
            sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)
        
        # \sqrt{1 - \bar\alpha_t}
        sqrt_one_minus_alpha_prod = (1.0 - alphas_cumprod[timesteps]).flatten() ** 0.5
        while len(sqrt_one_minus_alpha_prod.shape) < len(original_samples.shape):
            sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)
        
        return sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise

    @torch.no_grad()
    def sample(
        self,
        unet: UNet2DModel,
        batch_size: int,
        in_channels: int,
        sample_size: int,
    ):
        betas = self.betas.to(unet.device)
        alphas = self.alphas.to(unet.device)
        alphas_cumprod = self.alphas_cumprod.to(unet.device)
        timesteps = self.timesteps.to(unet.device)
        images = torch.randn((batch_size, in_channels, sample_size, sample_size), device=unet.device)
        for timestep in tqdm(timesteps, desc='Sampling'):
            pred_noise: torch.Tensor = unet(images, timestep).sample

            # mean of q(x_{t-1}|x_t)
            alpha_t = alphas[timestep]
            alpha_cumprod_t = alphas_cumprod[timestep]
            sqrt_alpha_t = alpha_t ** 0.5
            one_minus_alpha_t = 1.0 - alpha_t
            sqrt_one_minus_alpha_cumprod_t = (1 - alpha_cumprod_t) ** 0.5
            mean = (images - one_minus_alpha_t / sqrt_one_minus_alpha_cumprod_t * pred_noise) / sqrt_alpha_t
            
            # variance of q(x_{t-1}|x_t)
            if timestep > 1:
                beta_t = betas[timestep]
                one_minus_alpha_cumprod_t_minus_one = 1.0 - alphas_cumprod[timestep - 1]
                one_divided_by_sigma_square = alpha_t / beta_t + 1.0 / one_minus_alpha_cumprod_t_minus_one
                variance = (1.0 / one_divided_by_sigma_square) ** 0.5
            else:
                variance = torch.zeros_like(timestep)
            
            epsilon = torch.randn_like(images)
            images = mean + variance * epsilon
        images = (images / 2.0 + 0.5).clamp(0, 1).cpu().permute(0, 2, 3, 1).numpy()
        return images

model = UNet2DModel.from_pretrained('ddpm-animefaces-64').cuda()
ddpm = DDPM()
images = ddpm.sample(model, 32, 3, 64)

from diffusers.utils import make_image_grid, numpy_to_pil
image_grid = make_image_grid(numpy_to_pil(images), rows=4, cols=8)
image_grid.save('ddpm-sample-results.png')

此代碼也可在這個鏈接中找到。

✨ 主要特性

模型實現：從頭開始實現DDPM模型，未使用 DDPMScheduler，僅使用 UNet2DModel 並自行實現了簡單的調度器。
數據支持：基於 huggan/anime-faces 數據集進行訓練，可生成動漫人臉圖像。

📦 安裝指南

本項目未提供具體安裝命令，暫不展示安裝指南。

💻 使用示例

基礎用法

上述推理代碼即為基礎用法示例，可直接運行生成動漫人臉圖像。

高級用法

暫未提供高級用法示例。

📚 詳細文檔

訓練參數

屬性	詳情
模型類型	DDPM模型
訓練數據	huggan/anime-faces數據集

訓練參數詳情如下：

參數	值
圖像大小	64
訓練批次大小	16
評估批次大小	16
訓練輪數	50
梯度累積步數	1
學習率	1e - 4
學習率熱身步數	500
混合精度	"fp16"

訓練代碼請參考此鏈接。

🔧 技術細節

本項目從頭實現DDPM模型，在推理過程中，使用 UNet2DModel 進行噪聲預測，並通過自定義的 DDPM 類實現噪聲的添加和去除過程。具體來說，在 DDPM 類中，定義了 add_noise 方法用於向原始樣本添加噪聲，sample 方法用於從噪聲中逐步恢復出圖像。通過不斷迭代，從隨機噪聲開始，逐步去除噪聲，最終生成動漫人臉圖像。