đ Animagine XL 3.0 Base
Animagine XL 3.0 Base is the foundational version of a sophisticated anime text - to - image model, focusing on establishing core functionalities for high - quality anime image generation.
đ Quick Start
Animagine XL 3.0 Base is the starting point of the Animagine XL 3.0 project. However, it's not recommended for direct inference. For inference, please use Animagine XL 3.0.
⨠Features
- Two - stage Development: Employs a two - stage development process based on transfer learning, which effectively addresses issues like broken anatomy in the UNet after the first training stage.
- Structured Prompting: Has a specific prompt template for optimal results, guiding users on how to structure their prompts.
- Special Tags: Trained with special tags for quality, rating, and year, providing more control over the generated images.
đĻ Installation
No installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
For optimal results, follow the structured prompt template:
1girl/1boy, character name, from what series, everything else in any order.
Advanced Usage
To guide the model towards generating high - aesthetic images, use negative prompts:
nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name
Prepend prompts with:
masterpiece, best quality
But be cautious when using masterpiece
and best quality
due to the risk of NSFW content. It's better to add nsfw
, rating: sensitive
to the negative prompt and rating: general
to the positive prompt. Also, use a lower classifier - free guidance (CFG Scale) of around 5 - 7, sampling steps below 30, and Euler Ancestral (Euler a) as a sampler.
đ Documentation
Model Details
Property |
Details |
Developed by |
Linaqruf |
Model Type |
Diffusion - based text - to - image generative model |
Model Description |
Animagine XL 3.0 Base forms the foundational phase of the sophisticated anime image generation model. This version focuses on building core competencies in anime imagery, emphasizing foundational concept understanding and initial prompt interpretation. It's designed to establish the groundwork for advanced features seen in the full Animagine XL 3.0 model. |
License |
Fair AI Public License 1.0 - SD |
Finetuned from model |
Animagine XL 2.0 |
Usage Guidelines
Tag Ordering
Prompting is different in this iteration. For optimal results, follow the structured prompt template:
1girl/1boy, character name, from what series, everything else in any order.
Special Tags
The model was trained with special tags for quality, rating, and year. Although it can work without these tags, it's recommended to use them for better control.
Quality Modifiers
Quality Modifier |
Score Criterion |
masterpiece |
>150 |
best quality |
100 - 150 |
high quality |
75 - 100 |
medium quality |
25 - 75 |
normal quality |
0 - 25 |
low quality |
-5 - 0 |
worst quality |
<-5 |
Rating Modifiers
Rating Modifier |
Rating Criterion |
rating: general |
General |
rating: sensitive |
Sensitive |
rating: questionable , nsfw |
Questionable |
rating: explicit , nsfw |
Explicit |
Year Modifier
These tags help steer the result towards modern or vintage anime art styles, ranging from newest
to oldest
.
Year Tag |
Year Range |
Newest |
2022 to 2023 |
late |
2019 to 2021 |
mid |
2015 to 2018 |
early |
2011 to 2014 |
oldest |
2005 to 2010 |
Multi Aspect Resolution
This model supports generating images at the following dimensions:
Dimensions |
Aspect Ratio |
1024 x 1024 |
1:1 Square |
1152 x 896 |
9:7 |
896 x 1152 |
7:9 |
1216 x 832 |
19:13 |
832 x 1216 |
13:19 |
1344 x 768 |
7:4 Horizontal |
768 x 1344 |
4:7 Vertical |
1536 x 640 |
12:5 Horizontal |
640 x 1536 |
5:12 Vertical |
Training and Hyperparameters
- Animagine XL 3.0 was trained on a 2x A100 GPU with 80GB memory for 21 days or over 500 gpu hours. The training process had three stages:
- Feature Alignment Stage: Used 1.2m images to familiarize the model with basic anime concepts.
- Refining UNet Stage: Employed 2.5k curated datasets to fine - tune the UNet only.
Hyperparameters
Stage |
Epochs |
UNet Learning Rate |
Train Text Encoder |
Text Encoder Learning Rate |
Batch Size |
Mixed Precision |
Noise Offset |
Feature Alignment Stage |
10 |
7.5e - 6 |
True |
3.75e - 6 |
48 x 2 |
fp16 |
N/A |
Refining UNet Stage |
10 |
2e - 6 |
False |
N/A |
48 |
fp16 |
0.0357 |
Model Comparison
Training Config
Configuration Item |
Animagine XL 2.0 |
Animagine 3.0 |
GPU |
A100 80G |
2 x A100 80G |
Dataset |
170k + 83k images |
1271990 + 3500 Images |
Shuffle Separator |
N/A |
True |
Global Epochs |
20 |
20 |
Learning Rate |
1e - 6 |
7.5e - 6 |
Batch Size |
32 |
48 x 2 |
Train Text Encoder |
True |
True |
Train Special Tags |
True |
True |
Image Resolution |
1024 |
1024 |
Bucket Resolution |
2048 x 512 |
2048 x 512 |
Source code and training config are available here: https://github.com/cagliostrolab/sd-scripts/tree/main/notebook
đ§ Technical Details
The training process of Animagine XL 3.0 involves a two - stage development based on transfer learning. The first stage, Feature Alignment, uses 1.2m images to introduce basic anime concepts to the model. The second stage, Refining UNet, uses 2.5k curated datasets to fine - tune the UNet. Hyperparameters such as learning rates, batch sizes, and epochs are carefully set for each stage to optimize the model's performance.
đ License
This model is released under the Fair AI Public License 1.0 - SD.
â ī¸ Important Note
While "Animagine XL 3.0" represents a significant advancement in anime text - to - image generation, it has limitations. It prioritizes concept learning over art style, is not designed for photorealistic images, may struggle with complex anatomical structures, has dataset limitations, is not optimized for natural language processing, and using high - quality tags may carry a risk of generating NSFW content.
đĄ Usage Tip
Use the structured prompt template and special tags provided to have more control over the generated images. Also, be cautious when using high - quality tags to avoid NSFW content.