animagine-xl-2.0开源文生图模型 - 免费部署生成高分辨率动漫图像

首页

Animagine Xl 2.0

由 Linaqruf 开发

基于Stable Diffusion XL 1.0优化的高级潜变量文生图扩散模型，专精于生成高分辨率动漫图像

图像生成英语#动漫风格优化 #高分辨率生成 #LoRA风格适配

下载量 3,195

发布时间 : 11/13/2023

模型简介

Animagine XL 2.0是一个专注于生成高质量动漫图像的文生图扩散模型，通过17万张优质动漫数据集训练，显著提升了画面细节表现力和艺术风格多样性。

模型特点

高分辨率动漫图像生成

专为生成高分辨率动漫图像优化，支持1024x1024及以上分辨率输出。

多样化艺术风格

通过17万张优质动漫数据集训练，支持多种动漫艺术风格生成。

LoRA适配器支持

配套5种风格化LoRA适配器，可进一步增强特定风格表现。

多比例分辨率支持

支持1:1正方形和12:5横版等多种图像比例。

模型能力

动漫风格图像生成

高分辨率图像输出

多比例图像生成

风格化图像生成

使用案例

动漫创作

动漫角色设计

根据文本描述生成动漫角色概念图

高质量的角色概念图，可用于动漫、游戏等创作

动漫场景生成

根据文本描述生成动漫风格的场景图像

风格统一的动漫场景图像

创意设计

创意概念可视化

将创意概念快速转化为视觉图像

帮助设计师快速迭代创意概念

🚀 Animagine XL 2.0

Animagine XL 2.0 是一款先进的潜在文本到图像扩散模型，旨在创建高分辨率、细节丰富的动漫图像。它基于 Stable Diffusion XL 1.0 进行微调，使用了高质量的动漫风格图像数据集。作为 Animagine XL 1.0 的升级版，该模型在捕捉动漫艺术的多样独特风格方面表现出色，提供了更高的图像质量和美学效果。

模型示例展示

示例标题	输入文本	输出图片
1girl	face focus, cute, masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck	点击查看
1boy	face focus, bishounen, masterpiece, best quality, 1boy, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck	点击查看

图片描述	图片链接
sample1	点击查看
sample2	点击查看
sample3	点击查看
sample4	点击查看
sample1	点击查看
sample4	点击查看

✨ 主要特性

高质量动漫图像生成：能够根据文本描述创建详细且高质量的动漫图像。
支持多种风格：通过 LoRA 适配器可以实现多种独特的艺术风格。
用户友好接口：可通过 Gradio Web UI 和 Google Colab 进行图像生成。

📦 安装指南

确保安装最新的 diffusers 库以及其他必要的包：

pip install diffusers --upgrade
pip install transformers accelerate safetensors

💻 使用示例

基础用法

以下 Python 脚本展示了如何使用 Animagine XL 2.0 进行推理。模型配置中的默认调度器是 EulerAncestralDiscreteScheduler，为了清晰起见，可以显式定义它。

import torch
from diffusers import (
    StableDiffusionXLPipeline, 
    EulerAncestralDiscreteScheduler,
    AutoencoderKL
)

# 加载 VAE 组件
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", 
    torch_dtype=torch.float16
)

# 配置管道
pipe = StableDiffusionXLPipeline.from_pretrained(
    "Linaqruf/animagine-xl-2.0", 
    vae=vae,
    torch_dtype=torch.float16, 
    use_safetensors=True, 
    variant="fp16"
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')

# 定义提示并生成图像
prompt = "face focus, cute, masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"

image = pipe(
    prompt, 
    negative_prompt=negative_prompt, 
    width=1024,
    height=1024,
    guidance_scale=12,
    num_inference_steps=50
).images[0]

📚 详细文档

提示指南

Animagine XL 2.0 对自然语言描述的图像生成有较好的响应。例如：

A girl with mesmerizing blue eyes looks at the viewer. Her long, white hair is adorned with blue butterfly hair ornaments.

然而，为了获得最佳效果，建议在提示中使用 Danbooru 风格的标签，因为模型是使用这些标签标记的图像进行训练的。例如：

1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck

该模型在数据集处理过程中加入了质量和评级修饰符，根据指定的标准影响图像生成。

质量修饰符

质量修饰符	分数标准
masterpiece	>150
best quality	100 - 150
high quality	75 - 100
medium quality	25 - 75
normal quality	0 - 25
low quality	-5 - 0
worst quality	<-5

评级修饰符

评级修饰符	评级标准
-	general
-	sensitive
nsfw	questionable
nsfw	explicit

为了引导模型生成高美学的图像，可以使用负面提示，如：

lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry

为了获得更高质量的结果，可以在提示前加上：

masterpiece, best quality

质量标签比较

该表格详细比较了训练质量标签对生成结果的显著影响，展示了各种正负属性，说明了质量标签在引导视觉内容生成方面的作用。

对比项	详情
提示	"1girl, fu xuan, honkai:star rail, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck"
正标签情况1	-
正标签情况2	masterpiece, best quality
正标签情况3	-
正标签情况4	masterpiece, best quality
正标签情况5	masterpiece, best quality
负标签情况1	-
负标签情况2	-
负标签情况3	worst quality, low quality, normal quality
负标签情况4	worst quality, low quality, normal quality
负标签情况5	lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
对比图片1	点击查看
对比图片2	点击查看
对比图片3	点击查看
对比图片4	点击查看
对比图片5	点击查看

多方面分辨率

该模型支持生成以下尺寸的图像：

尺寸	纵横比
1024 x 1024	1:1 正方形
1152 x 896	9:7
896 x 1152	7:9
1216 x 832	19:13
832 x 1216	13:19
1344 x 768	7:4 水平
768 x 1344	4:7 垂直
1536 x 640	12:5 水平
640 x 1536	5:12 垂直

示例展示

图片描述	图片链接	生成参数详情
Twilight Contemplation - "Stelle, Amidst Shooting Stars and Mountain Silhouettes"	点击查看	点击查看 { "prompt": "cinematic photo (masterpiece), (best quality), (ultra-detailed), stelle, honkai: star rail, official art, 1girl, solo, gouache, starry sky, mountain, long hair, hoodie, shorts, sneakers, yellow eyes, tsurime, sitting on a rock, stargazing, milky way, shooting star, tranquil night., illustration, disheveled hair, detailed eyes, perfect composition, moist skin, intricate details, earrings . 35mm photograph, film, bokeh, professional, 4k, highly detailed", "negative_prompt": "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, uglylongbody, lowres, bad anatomy, bad hands, missing fingers, pubic hair, extra digit, fewer digits, cropped, worst quality, low quality", "resolution": "832 x 1216", "guidance_scale": 12, "num_inference_steps": 50, "seed": 1082676886, "sampler": "Euler a", "enable_lcm": false, "sdxl_style": "Photographic", "quality_tags": "Heavy", "refine_prompt": false, "use_lora": null, "use_upscaler": { "upscale_method": "nearest-exact", "upscaler_strength": 0.55, "upscale_by": 1.5, "new_resolution": "1248 x 1824" }, "datetime": "2023-11-25 06:42:21.342459"}
Serenade in Sunlight - "Caelus, immersed in music, strums his guitar in a room bathed in soft afternoon light."	点击查看	点击查看 { "prompt": "cinematic photo (masterpiece), (best quality), (ultra-detailed), caelus, honkai: star rail, 1boy, solo, playing guitar, living room, grey hair, short hair, yellow eyes, downturned eyes, passionate expression, casual clothes, acoustic guitar, sheet music stand, carpet, couch, window, sitting pose, strumming guitar, eyes closed., illustration, disheveled hair, detailed eyes, perfect composition, moist skin, intricate details, earrings . 35mm photograph, film, bokeh, professional, 4k, highly detailed", "negative_prompt": "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, uglylongbody, lowres, bad anatomy, bad hands, missing fingers, pubic hair, extra digit, fewer digits, cropped, worst quality, low quality", "resolution": "1216 x 832", "guidance_scale": 12, "num_inference_steps": 50, "seed": 1521939308, "sampler": "Euler a", "enable_lcm": false, "sdxl_style": "Photographic", "quality_tags": "Heavy", "refine_prompt": true, "use_lora": null, "use_upscaler": { "upscale_method": "nearest-exact", "upscaler_strength": 0.55, "upscale_by": 1.5, "new_resolution": "1824 x 1248" }, "datetime": "2023-11-25 07:08:39.622020"}
Night Market Glow - "Kafka serves up culinary delights, her smile as bright as the surrounding festival lights."	点击查看	点击查看 { "prompt": "cinematic photo (masterpiece), (best quality), (ultra-detailed), 1girl, solo, kafka, enjoying a street food festival, dark purple hair, shoulder length, hair clip, blue eyes, upturned eyes, excited expression, casual clothes, food stalls, variety of cuisines, people, outdoor seating, string lights, standing pose, holding a plate of food, trying new dishes, laughing with friends, experiencing the vibrant food culture., illustration, disheveled hair, detailed eyes, perfect composition, moist skin, intricate details, earrings . 35mm photograph, film, bokeh, professional, 4k, highly detailed", "negative_prompt": "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, uglylongbody, lowres, bad anatomy, bad hands, missing fingers, pubic hair, extra digit, fewer digits, cropped, worst quality, low quality", "resolution": "1216 x 832", "guidance_scale": 12, "num_inference_steps": 50, "seed": 1082676886, "sampler": "Euler a", "enable_lcm": false, "sdxl_style": "Photographic", "quality_tags": "Heavy", "refine_prompt": false, "use_lora": null, "use_upscaler": { "upscale_method": "nearest-exact", "upscaler_strength": 0.55, "upscale_by": 1.5, "new_resolution": "1824 x 1248" }, "datetime": "2023-11-25 06:51:53.961466"}

图片描述

图片链接

生成参数详情

Twilight Contemplation - "Stelle, Amidst Shooting Stars and Mountain Silhouettes"

点击查看

{  "prompt": "cinematic photo (masterpiece), (best quality), (ultra-detailed), stelle, honkai: star rail, official art, 1girl, solo, gouache, starry sky, mountain, long hair, hoodie, shorts, sneakers, yellow eyes, tsurime, sitting on a rock, stargazing, milky way, shooting star, tranquil night., illustration, disheveled hair, detailed eyes, perfect composition, moist skin, intricate details, earrings . 35mm photograph, film, bokeh, professional, 4k, highly detailed",  "negative_prompt": "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, uglylongbody, lowres, bad anatomy, bad hands, missing fingers, pubic hair, extra digit, fewer digits, cropped, worst quality, low quality",  "resolution": "832 x 1216",  "guidance_scale": 12,  "num_inference_steps": 50,  "seed": 1082676886,  "sampler": "Euler a",  "enable_lcm": false,  "sdxl_style": "Photographic",  "quality_tags": "Heavy",  "refine_prompt": false,  "use_lora": null,  "use_upscaler": {    "upscale_method": "nearest-exact",    "upscaler_strength": 0.55,    "upscale_by": 1.5,    "new_resolution": "1248 x 1824"  },  "datetime": "2023-11-25 06:42:21.342459"}

Serenade in Sunlight - "Caelus, immersed in music, strums his guitar in a room bathed in soft afternoon light."

点击查看

{  "prompt": "cinematic photo (masterpiece), (best quality), (ultra-detailed),  caelus, honkai: star rail, 1boy, solo, playing guitar, living room, grey hair, short hair, yellow eyes, downturned eyes, passionate expression, casual clothes, acoustic guitar, sheet music stand, carpet, couch, window, sitting pose, strumming guitar, eyes closed., illustration, disheveled hair, detailed eyes, perfect composition, moist skin, intricate details, earrings . 35mm photograph, film, bokeh, professional, 4k, highly detailed",  "negative_prompt": "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, uglylongbody, lowres, bad anatomy, bad hands, missing fingers, pubic hair, extra digit, fewer digits, cropped, worst quality, low quality",  "resolution": "1216 x 832",  "guidance_scale": 12,  "num_inference_steps": 50,  "seed": 1521939308,  "sampler": "Euler a",  "enable_lcm": false,  "sdxl_style": "Photographic",  "quality_tags": "Heavy",  "refine_prompt": true,  "use_lora": null,  "use_upscaler": {    "upscale_method": "nearest-exact",    "upscaler_strength": 0.55,    "upscale_by": 1.5,    "new_resolution": "1824 x 1248"  },  "datetime": "2023-11-25 07:08:39.622020"}

Night Market Glow - "Kafka serves up culinary delights, her smile as bright as the surrounding festival lights."

点击查看

{  "prompt": "cinematic photo (masterpiece), (best quality), (ultra-detailed), 1girl, solo, kafka, enjoying a street food festival, dark purple hair, shoulder length, hair clip, blue eyes, upturned eyes, excited expression, casual clothes, food stalls, variety of cuisines, people, outdoor seating, string lights, standing pose, holding a plate of food, trying new dishes, laughing with friends, experiencing the vibrant food culture., illustration, disheveled hair, detailed eyes, perfect composition, moist skin, intricate details, earrings . 35mm photograph, film, bokeh, professional, 4k, highly detailed",  "negative_prompt": "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, uglylongbody, lowres, bad anatomy, bad hands, missing fingers, pubic hair, extra digit, fewer digits, cropped, worst quality, low quality",  "resolution": "1216 x 832",  "guidance_scale": 12,  "num_inference_steps": 50,  "seed": 1082676886,  "sampler": "Euler a",  "enable_lcm": false,  "sdxl_style": "Photographic",  "quality_tags": "Heavy",  "refine_prompt": false,  "use_lora": null,  "use_upscaler": {    "upscale_method": "nearest-exact",    "upscaler_strength": 0.55,    "upscale_by": 1.5,    "new_resolution": "1824 x 1248"  },  "datetime": "2023-11-25 06:51:53.961466"}

🔧 技术细节

训练和超参数

Animagine XL 在配备 80GB 内存的 1x A100 GPU 上进行训练。训练过程包括两个阶段：
- 特征对齐阶段：使用 170k 张图像使模型熟悉基本的动漫概念。
- 美学调整阶段：使用 83k 高质量合成数据集来优化模型的艺术风格。

超参数

参数	值
全局轮数	20
学习率	1e-6
批量大小	32
训练文本编码器	True
图像分辨率	1024 (2048 x 512)
混合精度	fp16

模型比较（Animagine XL 1.0 与 Animagine XL 2.0）

图像比较

在第二代（Animagine XL 2.0）中，解决了在“回头看”和“从后面看”等姿势中普遍存在的“断颈”问题。现在，角色默认“看着观众”，提高了生成图像的自然度和准确性。对比图片

训练配置

配置项	Animagine XL 1.0	Animagine XL 2.0
GPU	A100 40G	A100 80G
数据集	8000 张图像	170k + 83k 张图像
全局轮数	不适用	20
学习率	4e-7	1e-6
批量大小	16	32
训练文本编码器	False	True
训练特殊标签	False	True
图像分辨率	1024	1024
桶分辨率	1024 x 256	2048 x 512
字幕丢弃率	0.5	0