Animagine XL 3.0 Basic Version Open-Source Model - Free Deployment for Efficient Generation of Anime Images from Text

Animagine Xl 3.0 Base

Developed by cagliostrolab

Animagine XL 3.0 Base is the foundational version of an advanced anime text-to-image model, focusing on establishing core functionalities and refining key aspects.

Image Generation EnglishOpen Source License:Other #Anime image generation #High-resolution support #Multiple aspect ratios

Downloads 810

Release Time : 1/10/2024

Model Overview

Animagine XL 3.0 Base represents the first two phases of the Animagine XL 3.0 project, developed through a two-stage transfer learning process to build core anime image generation capabilities.

Model Features

Two-stage training process

Utilizes feature alignment and UNet optimization in two training stages to effectively address issues like anatomical inaccuracies.

Structured prompt template

Supports structured prompt input for optimized image generation results.

Multiple aspect ratio support

Supports image generation in various aspect ratios from 1:1 to 12:5.

Quality control system

Built-in quality, rating, and year modifiers for fine-grained control.

Model Capabilities

Anime-style image generation

Text-to-image conversion

Multi-aspect ratio image generation

Image quality control

Use Cases

Creative design

Anime character design

Generate anime character images from text descriptions

High-quality anime-style character images

Concept art creation

Quickly generate anime-style concept art

Diverse anime-style concept illustrations

Content creation

Illustration generation

Generate accompanying illustrations for stories or games

Consistently styled series of illustrations

🚀 Animagine XL 3.0 Base

Animagine XL 3.0 Base is the foundational version of a sophisticated anime text - to - image model, focusing on establishing core functionalities for high - quality anime image generation.

🚀 Quick Start

Animagine XL 3.0 Base is the starting point of the Animagine XL 3.0 project. However, it's not recommended for direct inference. For inference, please use Animagine XL 3.0.

✨ Features

Two - stage Development: Employs a two - stage development process based on transfer learning, which effectively addresses issues like broken anatomy in the UNet after the first training stage.
Structured Prompting: Has a specific prompt template for optimal results, guiding users on how to structure their prompts.
Special Tags: Trained with special tags for quality, rating, and year, providing more control over the generated images.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

For optimal results, follow the structured prompt template:

1girl/1boy, character name, from what series, everything else in any order.

Advanced Usage

To guide the model towards generating high - aesthetic images, use negative prompts:

nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name

Prepend prompts with:

masterpiece, best quality

But be cautious when using masterpiece and best quality due to the risk of NSFW content. It's better to add nsfw, rating: sensitive to the negative prompt and rating: general to the positive prompt. Also, use a lower classifier - free guidance (CFG Scale) of around 5 - 7, sampling steps below 30, and Euler Ancestral (Euler a) as a sampler.

📚 Documentation

Model Details

Property	Details
Developed by	Linaqruf
Model Type	Diffusion - based text - to - image generative model
Model Description	Animagine XL 3.0 Base forms the foundational phase of the sophisticated anime image generation model. This version focuses on building core competencies in anime imagery, emphasizing foundational concept understanding and initial prompt interpretation. It's designed to establish the groundwork for advanced features seen in the full Animagine XL 3.0 model.
License	Fair AI Public License 1.0 - SD
Finetuned from model	Animagine XL 2.0

Usage Guidelines

Tag Ordering

Prompting is different in this iteration. For optimal results, follow the structured prompt template:

1girl/1boy, character name, from what series, everything else in any order.

Special Tags

The model was trained with special tags for quality, rating, and year. Although it can work without these tags, it's recommended to use them for better control.

Quality Modifiers

Quality Modifier	Score Criterion
`masterpiece`	>150
`best quality`	100 - 150
`high quality`	75 - 100
`medium quality`	25 - 75
`normal quality`	0 - 25
`low quality`	-5 - 0
`worst quality`	<-5

Rating Modifiers

Rating Modifier	Rating Criterion
`rating: general`	General
`rating: sensitive`	Sensitive
`rating: questionable`, `nsfw`	Questionable
`rating: explicit`, `nsfw`	Explicit

Year Modifier

These tags help steer the result towards modern or vintage anime art styles, ranging from newest to oldest.

Year Tag	Year Range
`Newest`	2022 to 2023
`late`	2019 to 2021
`mid`	2015 to 2018
`early`	2011 to 2014
`oldest`	2005 to 2010

Multi Aspect Resolution

This model supports generating images at the following dimensions:

Dimensions	Aspect Ratio
`1024 x 1024`	1:1 Square
`1152 x 896`	9:7
`896 x 1152`	7:9
`1216 x 832`	19:13
`832 x 1216`	13:19
`1344 x 768`	7:4 Horizontal
`768 x 1344`	4:7 Vertical
`1536 x 640`	12:5 Horizontal
`640 x 1536`	5:12 Vertical

Training and Hyperparameters

Animagine XL 3.0 was trained on a 2x A100 GPU with 80GB memory for 21 days or over 500 gpu hours. The training process had three stages:
- Feature Alignment Stage: Used 1.2m images to familiarize the model with basic anime concepts.
- Refining UNet Stage: Employed 2.5k curated datasets to fine - tune the UNet only.

Hyperparameters

Stage	Epochs	UNet Learning Rate	Train Text Encoder	Text Encoder Learning Rate	Batch Size	Mixed Precision	Noise Offset
Feature Alignment Stage	10	7.5e - 6	True	3.75e - 6	48 x 2	fp16	N/A
Refining UNet Stage	10	2e - 6	False	N/A	48	fp16	0.0357

Model Comparison

Training Config

Configuration Item	Animagine XL 2.0	Animagine 3.0
GPU	A100 80G	2 x A100 80G
Dataset	170k + 83k images	1271990 + 3500 Images
Shuffle Separator	N/A	True
Global Epochs	20	20
Learning Rate	1e - 6	7.5e - 6
Batch Size	32	48 x 2
Train Text Encoder	True	True
Train Special Tags	True	True
Image Resolution	1024	1024
Bucket Resolution	2048 x 512	2048 x 512

Source code and training config are available here: https://github.com/cagliostrolab/sd-scripts/tree/main/notebook

🔧 Technical Details

The training process of Animagine XL 3.0 involves a two - stage development based on transfer learning. The first stage, Feature Alignment, uses 1.2m images to introduce basic anime concepts to the model. The second stage, Refining UNet, uses 2.5k curated datasets to fine - tune the UNet. Hyperparameters such as learning rates, batch sizes, and epochs are carefully set for each stage to optimize the model's performance.

📄 License

This model is released under the Fair AI Public License 1.0 - SD.

⚠️ Important Note

While "Animagine XL 3.0" represents a significant advancement in anime text - to - image generation, it has limitations. It prioritizes concept learning over art style, is not designed for photorealistic images, may struggle with complex anatomical structures, has dataset limitations, is not optimized for natural language processing, and using high - quality tags may carry a risk of generating NSFW content.

💡 Usage Tip

Use the structured prompt template and special tags provided to have more control over the generated images. Also, be cautious when using high - quality tags to avoid NSFW content.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご