Evt_V4-preview开源动画风格模型 - 用更大数据集微调，高相似度效果出色

首页

Evt V4 Preview

由 haor 开发

EVT系列是一个针对动画风格模型进行大规模数据集微调的实验性项目，Evt_V4使用了比以往更大的数据集，其与ACertainty的余弦相似度达到了85%。

图像生成英语开源协议:Openrail #动漫风格优化 #高相似度微调 #大规模数据集训练

下载量 137

发布时间 : 1/9/2023

模型简介

Evt_V4是一个基于稳定扩散技术的文本到图像生成模型，专门针对动漫风格图像进行了优化。

模型特点

动漫风格优化

专门针对动漫风格图像进行了大规模数据集微调

高相似度

与ACertainty模型的余弦相似度达到85%

大规模训练

使用约55万张动漫风格图片训练了10个周期

模型能力

文本到图像生成

动漫风格图像生成

高质量图像渲染

使用案例

动漫创作

动漫角色设计

生成各种风格的动漫角色图像

示例中展示了1girl和鹿目圆等角色的高质量生成效果

场景创作

生成动漫风格的场景图像

示例中展示了田野、水果等元素的动漫风格场景

🚀 Evt_V4-preview

EVT系列是一个在动画风格模型上使用大型数据集进行微调的实验项目。Evt_V4使用了比以往更大的数据集，其与ACertainty的余弦相似度达到了85%。它的表现可能与其他模型不同，希望您能喜欢它。

🚀 快速开始

本模型的使用方法与其他Stable Diffusion模型相同。更多信息，请查看Stable Diffusion。

您还可以将模型导出为ONNX、MPS和/或FLAX/JAX格式。

💻 使用示例

基础用法

from diffusers import StableDiffusionPipeline
import torch

model_id = "haor/Evt_V4-preview"
branch_name= "main"

pipe = StableDiffusionPipeline.from_pretrained(model_id, revision=branch_name, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "1girl"
image = pipe(prompt).images[0]

image.save("./1girl.png")

示例展示

提示词1

Prompt1 Prompt1

1girl in black serafuku standing in a field solo, food, fruit, lemon, bubble, planet, moon, orange \(fruit\), lemon slice, leaf, fish, orange slice, by (tabi:1.25), spot color, looking at viewer, closeup cowboy shot
Negative prompt: (bad:0.81), (comic:0.81), (cropped:0.81), (error:0.81), (extra:0.81), (low:0.81), (lowres:0.81), (speech:0.81), (worst:0.81), (blush:0.9), 2koma, 3koma, 4koma, collage, lipstick
Steps: 20, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 2285895007, Size: 512x1152, Denoising strength: 0.7, Clip skip: 2

提示词2

Prompt2 Prompt2

{Masterpiece, Kaname_Madoka, tall and long double tails, well rooted hair, (pink hair), pink eyes, crossed bangs, ojousama, jk, thigh bandages, wrist cuffs, (pink bow: 1.2)}, plain color, sketch, masterpiece, high detail, masterpiece portrait, best quality, ray tracing, {:<, look at the edge}
Negative prompt: ((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)),extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((bad proportions))), ((extra limbs)), (((deformed))), (((disfigured))), cloned face, gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), too many fingers, (((long neck))), (((low quality))), normal quality, blurry, bad feet, text font ui, ((((worst quality)))), anatomical nonsense, (((bad shadow))), unnatural body, liquid body, 3D, 3D game, 3D game scene, 3D character, bad hairs, poorly drawn hairs, fused hairs, big muscles, bad face, extra eyes, furry, pony, mosaic, disappearing calf, disappearing legs, extra digit, fewer digit, fused digit, missing digit, fused feet, poorly drawn eyes, big face, long face, bad eyes, thick lips, obesity, strong girl, beardï¼ŒExcess legs
Steps: 20, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 2468255263, Size: 512x1152, Denoising strength: 0.7, Clip skip: 2

🔧 技术细节

训练信息

基础模型：ACertainty
训练数据：使用约550k张动漫风格图像（pixiv和yandere）进行了10个周期的训练。
分辨率：512
UCG：0.1
使用arb：True
训练器：Mikubill/naifu-diffusion

arb配置

arb:
  enabled: true
  debug: false
  base_res: [512, 512]
  max_size: [768, 512]
  divisible: 64
  max_ar_error: 4
  min_dim: 256
  dim_limit: 1024

调度器和优化器配置

scheduler:
  name: diffusers.DDIMScheduler
  params:
      beta_end: 0.012
      beta_schedule: "scaled_linear"
      beta_start: 0.00085
      clip_sample: false
      num_train_timesteps: 1000
      set_alpha_to_one: false
      steps_offset: 1
      trained_betas: null

optimizer:
  name: bitsandbytes.optim.AdamW8bit
  params:
    lr: 2e-6
    weight_decay: 5e-2
    eps: 1e-7

lr_scheduler:
  name: torch.optim.lr_scheduler.CosineAnnealingWarmRestarts
  warmup: 
    enabled: true
    init_lr: 2e-8
    num_warmup: 50
    strategy: "cos"  
  params:
    T_0: 5
    T_mult: 1
    eta_min: 6e-7
    last_epoch: -1

训练资源消耗

大约花费了300个V100 GPU小时。

📄 许可证

本模型开放访问，所有人均可使用，并遵循CreativeML OpenRAIL - M许可证，该许可证进一步规定了权利和使用方式。

CreativeML OpenRAIL许可证规定：

您不得使用该模型故意生成或分享非法或有害的输出或内容。
作者对您生成的输出不主张任何权利，您可以自由使用它们，但需对其使用负责，且使用不得违反许可证中的规定。
您可以重新分发模型权重，并将模型用于商业用途和/或作为服务使用。如果这样做，请务必包含与许可证中相同的使用限制，并向所有用户提供一份CreativeML OpenRAIL - M许可证副本（请完整、仔细阅读许可证）。请在此阅读完整许可证