Model Overview
Model Features
Model Capabilities
Use Cases
🚀 SoteDiffusion Wuerstchen3
Anime finetune of Würstchen V3. This model offers high - quality anime image generation capabilities, fine - tuned to meet specific aesthetic and quality standards.
🚀 Quick Start
A new version is available: https://huggingface.co/Disty0/sotediffusion-v2
✨ Features
- This release is sponsored by fal.ai/grants.
- Trained on 6M images for 3 epochs using 8x A100 80G GPUs.
💻 Usage Examples
API Usage
This model can be used via API with Fal.AI. For more details: https://fal.ai/models/fal-ai/stable-cascade/sote-diffusion
UI Guide
SD.Next
URL: https://github.com/vladmandic/automatic/
Go to Models -> Huggingface and type Disty0/sotediffusion-wuerstchen3-decoder
into the model name and press download. Load Disty0/sotediffusion-wuerstchen3-decoder
after the download process is complete.
Prompt:
newest, extremely aesthetic, best quality,
Negative Prompt:
very displeasing, worst quality, monochrome, realistic, oldest, loli,
Parameters: Sampler: Default Steps: 30 or 40 Refiner Steps: 10 CFG: 7 Secondary CFG: 2 or 1 Resolution: 1024x1536, 2048x1152. Anything works as long as it's a multiply of 128.
ComfyUI
Please refer to CivitAI: https://civitai.com/models/353284
Code Example
Basic Usage
import torch
from diffusers import StableCascadeCombinedPipeline
device = "cuda"
dtype = torch.bfloat16 # or torch.float16
model = "Disty0/sotediffusion-wuerstchen3-decoder"
pipe = StableCascadeCombinedPipeline.from_pretrained(model, torch_dtype=dtype)
# send everything to the gpu:
pipe = pipe.to(device, dtype=dtype)
pipe.prior_pipe = pipe.prior_pipe.to(device, dtype=dtype)
# or enable model offload to save vram:
# pipe.enable_model_cpu_offload()
prompt = "newest, extremely aesthetic, best quality, 1girl, solo, cat ears, pink hair, orange eyes, long hair, bare shoulders, looking at viewer, smile, indoors, casual, living room, playing guitar,"
negative_prompt = "very displeasing, worst quality, monochrome, realistic, oldest, loli,"
output = pipe(
width=1024,
height=1536,
prompt=prompt,
negative_prompt=negative_prompt,
decoder_guidance_scale=2.0,
prior_guidance_scale=7.0,
prior_num_inference_steps=30,
output_type="pil",
num_inference_steps=10
).images[0]
## do something with the output image
🔧 Technical Details
Training
Software used: Kohya SD - Scripts with Stable Cascade branch. https://github.com/kohya-ss/sd-scripts/tree/stable-cascade GPU used: 8x Nvidia A100 80GB GPU Hours: 220
Base
Property | Details |
---|---|
amp | bf16 |
weights | fp32 |
save weights | fp16 |
resolution | 1024x1024 |
effective batch size | 128 |
unet learning rate | 1e - 5 |
te learning rate | 4e - 6 |
optimizer | Adafactor |
images | 6M |
epochs | 3 |
Final
Property | Details |
---|---|
amp | bf16 |
weights | fp32 |
save weights | fp16 |
resolution | 1024x1024 |
effective batch size | 128 |
unet learning rate | 4e - 6 |
te learning rate | none |
optimizer | Adafactor |
images | 120K |
epochs | 16 |
Dataset
GPU used for captioning: 1x Intel ARC A770 16GB GPU Hours: 350 Model used for captioning: SmilingWolf/wd - swinv2 - tagger - v3 Model used for text: llava - hf/llava - 1.5 - 7b - hf
Command:
python /mnt/DataSSD/AI/Apps/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py --model_dir "/mnt/DataSSD/AI/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --use_rating_tags --character_tags_first --character_tag_expand --append_tags --onnx --caption_separator ", " --general_threshold 0.35 --character_threshold 0.50 --batch_size 4 --caption_extension ".txt" ./
Property | Details |
---|---|
newest | 1,848,331 |
recent | 1,380,630 |
mid | 993,227 |
early | 566,152 |
oldest | 160,397 |
pixiv | 343,614 |
visual novel cg | 231,358 |
anime wallpaper | 104,790 |
Total | 5,628,499 |
Note:
- Smallest size is 1280x600 (768,000 pixels)
- Deduped based on image similarity using czkawka - cli
- Around 120K very high quality images got intentionally duplicated 5 times, making the total image count 6.2M
Tags
Model is trained with random tag order but this is the order in the dataset if you are interested:
aesthetic tags, quality tags, date tags, custom tags, rating tags, character, series, rest of the tags
Date
Property | Details |
---|---|
newest | 2022 to 2024 |
recent | 2019 to 2021 |
mid | 2015 to 2018 |
early | 2011 to 2014 |
oldest | 2005 to 2010 |
Aesthetic Tags
Model used: shadowlilac/aesthetic - shadow - v2
Property | Details | Count |
---|---|---|
> 0.90 | extremely aesthetic | 125,451 |
> 0.80 | very aesthetic | 887,382 |
> 0.70 | aesthetic | 1,049,857 |
> 0.50 | slightly aesthetic | 1,643,091 |
> 0.40 | not displeasing | 569,543 |
> 0.30 | not aesthetic | 445,188 |
> 0.20 | slightly displeasing | 341,424 |
> 0.10 | displeasing | 237,660 |
rest | very displeasing | 328,712 |
Quality Tags
Model used: https://huggingface.co/hakurei/waifu - diffusion - v1 - 4/blob/main/models/aes - B32 - v0.pth
Property | Details | Count |
---|---|---|
> 0.980 | best quality | 1,270,447 |
> 0.900 | high quality | 498,244 |
> 0.750 | great quality | 351,006 |
> 0.500 | medium quality | 366,448 |
> 0.250 | normal quality | 368,380 |
> 0.125 | bad quality | 279,050 |
> 0.025 | low quality | 538,958 |
rest | worst quality | 1,955,966 |
Rating Tags
Property | Details |
---|---|
general | 1,416,451 |
sensitive | 3,447,664 |
nsfw | 427,459 |
explicit nsfw | 336,925 |
Custom Tags
Property | Details |
---|---|
image boards | date, |
text | The text says "text", |
characters | character, series |
pixiv | art by Display_Name, |
visual novel cg | Full_VN_Name (short_3_letter_name), visual novel cg, |
anime wallpaper | date, anime wallpaper, |
Limitations and Bias
Bias
- This model is intended for anime illustrations. Realistic capabilites are not tested at all.
Limitations
- Can fall back to realistic. Add "realistic" tag to the negatives when this happens.
- Far shot eyes and hands can be bad.
📄 License
SoteDiffusion models falls under Fair AI Public License 1.0 - SD license, which is compatible with Stable Diffusion models’ license. Key points:
- Modification Sharing: If you modify SoteDiffusion models, you must share both your changes and the original license.
- Source Code Accessibility: If your modified version is network - accessible, provide a way (like a download link) for others to get the source code. This applies to derived models too.
- Distribution Terms: Any distribution must be under this license or another with similar rules.
- Compliance: Non - compliance must be fixed within 30 days to avoid license termination, emphasizing transparency and adherence to open - source values.
Notes: Anything not covered by Fair AI license is inherited from Stability AI Non - Commercial license which is named as LICENSE_INHERIT.