๐ Animagine XL 4.0 Zero
The ultimate anime-themed finetuned SDXL model, delivering high-quality anime-style image generation.
๐ Quick Start
Animagine XL 4.0 Zero is a powerful anime-themed text-to-image model. You can use it in various ways, such as through Hugging Face Spaces, ComfyUI, Stable Diffusion Webui, or with the diffusers library.
โจ Features
- Anime Themed: Based on a massive dataset of 8.4M diverse anime-style images.
- Finetuned SDXL: Retrained from Stable Diffusion XL 1.0 for better performance.
- Pretrained Base Model: Ideal for LoRA training and further finetuning.
- Support Special Tags: Allows control of image generation through various special tags.
๐ฆ Installation
๐งจ Diffusers Installation
1. Install Required Libraries
pip install diffusers transformers accelerate safetensors --upgrade
2. Example Code
The following example uses the lpw_stable_diffusion_xl
pipeline, which can better handle long, weighted, and detailed prompts. The model is already in FP16 format, so there's no need to specify variant="fp16"
in the from_pretrained
call.
import torch
from diffusers import StableDiffusionXLPipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
"cagliostrolab/animagine-xl-4.0-zero",
torch_dtype=torch.float16,
use_safetensors=True,
custom_pipeline="lpw_stable_diffusion_xl",
add_watermarker=False
)
pipe.to('cuda')
prompt = "1girl, arima kana, oshi no ko, hoshimachi suisei, hoshimachi suisei \(1st costume\), cosplay, looking at viewer, smile, outdoors, night, v, masterpiece, high score, great score, absurdres"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing finger, extra digits, fewer digits, cropped, worst quality, low quality, low score, bad score, average score, signature, watermark, username, blurry"
image = pipe(
prompt,
negative_prompt=negative_prompt,
width=832,
height=1216,
guidance_scale=6,
num_inference_steps=25
).images[0]
image.save("./arima_kana.png")
๐ป Usage Examples
Basic Usage
The model was trained with tag-based captions and the tag-ordering method. Use the following structured template:
1girl/1boy/1other, character name, from which series, rating, everything else in any order and end with quality enhancement
Add quality enhancement tags at the end of your prompt:
masterpiece, high score, great score, absurdres
Recommended negative prompt:
lowres, bad anatomy, bad hands, text, error, missing finger, extra digits, fewer digits, cropped, worst quality, low quality, low score, bad score, average score, signature, watermark, username, blurry
Advanced Usage
Optimal settings:
- CFG Scale: 4 - 7 (5 Recommended)
- Sampling Steps: 25 - 28 (28 Recommended)
- Preferred Sampler: Euler Ancestral (Euler a)
Recommended resolutions:
Orientation |
Dimensions |
Aspect Ratio |
Square |
1024 x 1024 |
1:1 |
Landscape |
1152 x 896 |
9:7 |
|
1216 x 832 |
3:2 |
|
1344 x 768 |
7:4 |
|
1536 x 640 |
12:5 |
Portrait |
896 x 1152 |
7:9 |
|
832 x 1216 |
2:3 |
|
768 x 1344 |
4:7 |
|
640 x 1536 |
5:12 |
Final prompt structure example:
1girl, firefly \(honkai: star rail\), honkai \(series\), honkai: star rail, safe, casual, solo, looking at viewer, outdoors, smile, reaching towards viewer, night, masterpiece, high score, great score, absurdres
๐ Documentation
Special Tags
The model supports various special tags to control different aspects of image generation:
- Quality Tags:
masterpiece
, best quality
, low quality
, worst quality
- Score Tags:
high score
, great score
, good score
, average score
, bad score
, low score
- Temporal Tags:
year 2005
, year {n}
, year 2025
- Rating Tags:
safe
, sensitive
, nsfw
, explicit
Training Information
Parameter |
Value |
Hardware |
7 x H100 80GB SXM5 |
Num Images |
8,401,464 |
UNet Learning Rate |
2.5e-6 |
Text Encoder Learning Rate |
1.25e-6 |
Scheduler |
Constant With Warmup |
Warmup Steps |
5% |
Batch Size |
32 |
Gradient Accumulation Steps |
2 |
Training Resolution |
1024x1024 |
Optimizer |
Adafactor |
Input Perturbation Noise |
0.1 |
Debiased Estimation Loss |
Enabled |
Mixed Precision |
fp16 |
๐ง Technical Details
The model was retrained from Stable Diffusion XL 1.0 with a massive dataset of 8.4M diverse anime-style images. It was trained using state-of-the-art hardware and optimized hyperparameters for approximately 2650 GPU hours.
๐ License
This model adopts the original CreativeML Open RAIL++-M License from Stability AI.
- โ
Permitted: Commercial use, modifications, distributions, private use
- โ Prohibited: Illegal activities, harmful content generation, discrimination, exploitation
- โ ๏ธ Requirements: Include license copy, state changes, preserve notices
- ๐ Warranty: Provided "AS IS" without warranties
Acknowledgement
This project is made possible thanks to the contributions of Stability AI, Novel AI, and Waifu Diffusion Team. We're also grateful for the kickstarter grant from Main and the support from the community. Special thanks to:
- Moescape AI: Our collaboration partner in model distribution and testing
- Lesser Rabbit: For providing computing and research grants
- Kohya SS: For developing the open-source training framework
- discus0434: For creating the Aesthetic Predictor 2.5
- Early testers: For providing feedback and quality assurance
Contributors
Model
Gradio
Relations, finance, and quality assurance
Data
Fundraising
We've introduced new fundraising methods through GitHub Sponsors. You can support us in the following ways:
- Donate: Contribute via ETH, USDT, or USDC to
0xd8A1dA94BA7E6feCe8CfEacc1327f498fCcBFC0C
or sponsor us on GitHub.
- Share: Spread the word about our models.
- Feedback: Let us know how we can improve.
Why do we use Cryptocurrency?
Our PayPal account was banned when we used Ko-fi and PayPal for fundraising. To ensure transparency, we've switched to cryptocurrency.
Want to Donate in Non-Crypto Currency?
If you prefer non-crypto donation, contact us via our Discord Server or GitHub Sponsors.
Join Our Discord Server
Join our discord server: https://discord.gg/cqh9tZgbGc
Limitations
- Prompt Format: Limited to tag-based text prompts.
- Anatomy: May struggle with complex anatomical details.
- Text Generation: Text rendering in images is not supported.
- New Characters: Recent characters may have lower accuracy.
- Multiple Characters: Scenes with multiple characters need careful prompt engineering.
- Resolution: Higher resolutions may show degradation.
- Style Consistency: May require specific style tags.