sotediffusion-wuerstchen3開源模型 - 免費生成高質量動漫風格圖像

首頁

Sotediffusion Wuerstchen3

由Disty0開發

基於Würstchen V3的動漫風格微調模型，專注於生成高質量的動漫風格圖像

圖像生成英語開源協議:其他 #動漫風格生成 #高分辨率圖像 #文本到圖像轉換

下載量 467

發布時間 : 6/10/2024

模型概述

這是一個基於Würstchen V3架構的動漫風格文本生成圖像模型，經過600萬張圖像的微調訓練，能夠生成高質量的動漫風格圖像。

模型特點

高質量動漫風格

專注於生成高質量的動漫風格圖像

大規模訓練

使用8塊A100 80G顯卡訓練了600萬張圖像

API支持

可通過Fal.AI的API調用使用

模型能力

文本生成圖像

動漫風格圖像生成

高分辨率圖像生成

使用案例

創意藝術

動漫角色設計

根據文本描述生成動漫角色概念圖

高質量動漫風格角色圖像

動漫場景生成

根據文本描述生成動漫風格的場景

1024x1536或更高分辨率的場景圖像

🚀 SoteDiffusion Wuerstchen3

SoteDiffusion Wuerstchen3 是對 Würstchen V3 進行的動漫微調模型，可用於將文本轉化為動漫風格的圖像。

新版本信息

新版本已發佈：https://huggingface.co/Disty0/sotediffusion-v2

🚀 快速開始

本模型可通過 API 與 Fal.AI 結合使用，更多詳情請參考：https://fal.ai/models/fal-ai/stable-cascade/sote-diffusion

✨ 主要特性

本版本由 fal.ai/grants 贊助發佈。
使用 8 塊 A100 80G GPU，在 600 萬張圖像上進行了 3 個輪次的訓練。

📦 安裝指南

SD.Next

訪問：https://github.com/vladmandic/automatic/
進入 Models -> Huggingface，在模型名稱中輸入 Disty0/sotediffusion-wuerstchen3-decoder 並點擊下載。
下載完成後，加載 Disty0/sotediffusion-wuerstchen3-decoder。

ComfyUI

請參考 CivitAI：https://civitai.com/models/353284

💻 使用示例

基礎用法

import torch
from diffusers import StableCascadeCombinedPipeline

device = "cuda"
dtype = torch.bfloat16 # or torch.float16
model = "Disty0/sotediffusion-wuerstchen3-decoder"

pipe = StableCascadeCombinedPipeline.from_pretrained(model, torch_dtype=dtype)

# send everything to the gpu:
pipe = pipe.to(device, dtype=dtype)
pipe.prior_pipe = pipe.prior_pipe.to(device, dtype=dtype)

# or enable model offload to save vram:
# pipe.enable_model_cpu_offload()

prompt = "newest, extremely aesthetic, best quality, 1girl, solo, cat ears, pink hair, orange eyes, long hair, bare shoulders, looking at viewer, smile, indoors, casual, living room, playing guitar,"
negative_prompt = "very displeasing, worst quality, monochrome, realistic, oldest, loli,"
output = pipe(
    width=1024,
    height=1536,
    prompt=prompt,
    negative_prompt=negative_prompt,
    decoder_guidance_scale=2.0,
    prior_guidance_scale=7.0,
    prior_num_inference_steps=30,
    output_type="pil",
    num_inference_steps=10
).images[0]

## do something with the output image

📚 詳細文檔

模型參數

基礎訓練參數

參數	值
amp	bf16
weights	fp32
save weights	fp16
resolution	1024x1024
effective batch size	128
unet learning rate	1e-5
te learning rate	4e-6
optimizer	Adafactor
images	6M
epochs	3

最終訓練參數

參數	值
amp	bf16
weights	fp32
save weights	fp16
resolution	1024x1024
effective batch size	128
unet learning rate	4e-6
te learning rate	none
optimizer	Adafactor
images	120K
epochs	16

數據集信息

數據集規模

數據集名稱	總圖像數
newest	1,848,331
recent	1,380,630
mid	993,227
early	566,152
oldest	160,397
pixiv	343,614
visual novel cg	231,358
anime wallpaper	104,790
Total	5,628,499

數據集說明

最小尺寸為 1280x600（768,000 像素）。
使用 czkawka-cli 基於圖像相似度進行去重。
約 120K 張高質量圖像有意重複 5 次，使總圖像數達到 620 萬。

標籤信息

標籤順序

模型以隨機標籤順序進行訓練，但數據集中的標籤順序如下：

aesthetic tags, quality tags, date tags, custom tags, rating tags, character, series, rest of the tags

日期標籤

標籤	日期
newest	2022 至 2024
recent	2019 至 2021
mid	2015 至 2018
early	2011 至 2014
oldest	2005 至 2010

美學標籤

分數大於	標籤	數量
0.90	extremely aesthetic	125,451
0.80	very aesthetic	887,382
0.70	aesthetic	1,049,857
0.50	slightly aesthetic	1,643,091
0.40	not displeasing	569,543
0.30	not aesthetic	445,188
0.20	slightly displeasing	341,424
0.10	displeasing	237,660
rest of them	very displeasing	328,712

質量標籤

分數大於	標籤	數量
0.980	best quality	1,270,447
0.900	high quality	498,244
0.750	great quality	351,006
0.500	medium quality	366,448
0.250	normal quality	368,380
0.125	bad quality	279,050
0.025	low quality	538,958
rest of them	worst quality	1,955,966

評級標籤

標籤	數量
general	1,416,451
sensitive	3,447,664
nsfw	427,459
explicit nsfw	336,925

自定義標籤

數據集名稱	自定義標籤
image boards	date,
text	The text says "text",
characters	character, series
pixiv	art by Display_Name,
visual novel cg	Full_VN_Name (short_3_letter_name), visual novel cg,
anime wallpaper	date, anime wallpaper,

🔧 技術細節

訓練信息

使用軟件：Kohya SD-Scripts with Stable Cascade branch。https://github.com/kohya-ss/sd-scripts/tree/stable-cascade
使用 GPU：8x Nvidia A100 80GB
GPU 時長：220 小時

標註信息

用於標註的 GPU：1x Intel ARC A770 16GB
GPU 時長：350 小時
用於標註的模型：SmilingWolf/wd-swinv2-tagger-v3
用於文本的模型：llava-hf/llava-1.5-7b-hf
標註命令：

python /mnt/DataSSD/AI/Apps/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py --model_dir "/mnt/DataSSD/AI/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --use_rating_tags --character_tags_first --character_tag_expand --append_tags --onnx --caption_separator ", " --general_threshold 0.35 --character_threshold 0.50 --batch_size 4 --caption_extension ".txt" ./