Animagine XL 4.0 Zero开源图像生成模型 - 免费支持高质量动漫图像制作

首页

Animagine Xl 4.0 Zero

由 cagliostrolab 开发

Animagine XL 4.0 Zero是基于Stable Diffusion XL 1.0微调的终极动漫主题文本生成图像模型，使用840万张动漫风格图像训练，支持高质量动漫图像生成。

图像生成英语#动漫风格生成 #高分辨率图像 #SDXL微调

下载量 798

发布时间 : 2/13/2025

模型简介

该模型专门用于根据文本提示生成和修改动漫主题图像，是LoRA训练和进一步微调的理想基础。

模型特点

大规模高质量训练数据

使用840万张多样化动漫风格图像训练，知识截止日期为2025年1月7日

标签排序训练方法

采用标签排序方法进行身份和风格训练，提供更精确的控制

优化的提示结构

支持结构化提示输入，包括角色、作品来源、评级和质量增强标签

特殊标签支持

支持质量标签、评分标签、年代标签和分级标签等多种特殊控制标签

模型能力

动漫风格图像生成

高质量细节渲染

风格控制

角色特征保持

负面提示控制

使用案例

动漫创作

动漫角色生成

根据文本描述生成特定动漫角色图像

高保真、细节丰富的角色图像

动漫场景创作

生成特定风格和氛围的动漫场景

风格一致的场景图像

内容创作

动漫插画创作

为故事或游戏生成概念艺术和插画

专业级动漫风格艺术作品

🚀 Animagine XL 4.0 Zero

Animagine XL 4.0 Zero 是一款终极动漫主题的微调 SDXL 模型，也是 Animagine XL 系列的最新版本。它能基于文本提示生成和修改动漫主题图像，为动漫图像创作提供强大支持。

🚀 快速开始

你可以通过以下几种方式使用该模型：

在 Hugging Face Spaces 中使用此模型。
在 ComfyUI 或 Stable Diffusion Webui 中使用它。
使用 🧨 diffusers 库来调用它。

✨ 主要特性

强大的动漫图像生成能力：基于大规模的动漫风格图像数据集进行训练，能够生成高质量、多样化的动漫主题图像。
可作为预训练基础模型：适合用于 LoRA 训练和进一步的微调，为模型的定制化开发提供基础。
支持多种特殊标签：通过特殊标签可以控制图像生成的各个方面，如质量、风格、时间等。

📦 安装指南

1. 安装所需库

pip install diffusers transformers accelerate safetensors --upgrade

2. 示例代码

以下示例使用 lpw_stable_diffusion_xl 管道，它能更好地处理长、加权和详细的提示。模型已以 FP16 格式上传，因此在 from_pretrained 调用中无需指定 variant="fp16"。

import torch
from diffusers import StableDiffusionXLPipeline

pipe = StableDiffusionXLPipeline.from_pretrained(
    "cagliostrolab/animagine-xl-4.0-zero",
    torch_dtype=torch.float16,
    use_safetensors=True,
    custom_pipeline="lpw_stable_diffusion_xl",
    add_watermarker=False
)
pipe.to('cuda')

prompt = "1girl, arima kana, oshi no ko, hoshimachi suisei, hoshimachi suisei \(1st costume\), cosplay, looking at viewer, smile, outdoors, night, v, masterpiece, high score, great score, absurdres"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing finger, extra digits, fewer digits, cropped, worst quality, low quality, low score, bad score, average score, signature, watermark, username, blurry"

image = pipe(
    prompt,
    negative_prompt=negative_prompt,
    width=832,
    height=1216,
    guidance_scale=6,
    num_inference_steps=25
).images[0]

image.save("./arima_kana.png")

💻 使用示例

基础用法

import torch
from diffusers import StableDiffusionXLPipeline

# 加载模型
pipe = StableDiffusionXLPipeline.from_pretrained(
    "cagliostrolab/animagine-xl-4.0-zero",
    torch_dtype=torch.float16,
    use_safetensors=True,
    custom_pipeline="lpw_stable_diffusion_xl",
    add_watermarker=False
)
pipe.to('cuda')

# 设置提示词和负提示词
prompt = "1girl, cute, smile, outdoors"
negative_prompt = "lowres, bad anatomy"

# 生成图像
image = pipe(
    prompt,
    negative_prompt=negative_prompt,
    width=832,
    height=1216,
    guidance_scale=6,
    num_inference_steps=25
).images[0]

# 保存图像
image.save("./example.png")

高级用法

import torch
from diffusers import StableDiffusionXLPipeline

# 加载模型
pipe = StableDiffusionXLPipeline.from_pretrained(
    "cagliostrolab/animagine-xl-4.0-zero",
    torch_dtype=torch.float16,
    use_safetensors=True,
    custom_pipeline="lpw_stable_diffusion_xl",
    add_watermarker=False
)
pipe.to('cuda')

# 设置复杂提示词和负提示词
prompt = "1girl, arima kana, oshi no ko, hoshimachi suisei, hoshimachi suisei \(1st costume\), cosplay, looking at viewer, smile, outdoors, night, v, masterpiece, high score, great score, absurdres, year 2025"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing finger, extra digits, fewer digits, cropped, worst quality, low quality, low score, bad score, average score, signature, watermark, username, blurry"

# 调整生成参数
width = 1216
height = 832
guidance_scale = 7
num_inference_steps = 28

# 生成图像
image = pipe(
    prompt,
    negative_prompt=negative_prompt,
    width=width,
    height=height,
    guidance_scale=guidance_scale,
    num_inference_steps=num_inference_steps
).images[0]

# 保存图像
image.save("./advanced_example.png")

📚 详细文档

使用指南

1. 提示词结构

模型使用基于标签的标题和标签排序方法进行训练。请使用以下结构化模板：

1girl/1boy/1other, 角色名称, 所属系列, 评级, 其他任意顺序的描述，最后加上质量提升标签

2. 质量提升标签

在提示词末尾添加以下标签：

masterpiece, high score, great score, absurdres

3. 推荐的负提示词

lowres, bad anatomy, bad hands, text, error, missing finger, extra digits, fewer digits, cropped, worst quality, low quality, low score, bad score, average score, signature, watermark, username, blurry

4. 最佳设置

CFG Scale：4 - 7（推荐 5）
采样步数：25 - 28（推荐 28）
首选采样器：Euler Ancestral (Euler a)

5. 推荐分辨率

方向	尺寸	纵横比
方形	1024 x 1024	1:1
横向	1152 x 896	9:7
	1216 x 832	3:2
	1344 x 768	7:4
	1536 x 640	12:5
纵向	896 x 1152	7:9
	832 x 1216	2:3
	768 x 1344	4:7
	640 x 1536	5:12

6. 最终提示词结构示例

1girl, firefly \(honkai: star rail\), honkai \(series\), honkai: star rail, safe, casual, solo, looking at viewer, outdoors, smile, reaching towards viewer, night, masterpiece, high score, great score, absurdres

特殊标签

模型支持各种特殊标签，可用于控制图像生成过程的不同方面。这些标签经过精心加权和测试，以在不同提示词下提供一致的结果。

质量标签

质量标签是直接影响图像整体质量和细节水平的基本控制项。可用的质量标签有：

masterpiece
best quality
low quality
worst quality


使用 `"masterpiece, best quality"` 质量标签且负提示词为空的示例图像。	使用 `"low quality, worst quality"` 质量标签且负提示词为空的示例图像。

分数标签

与基本质量标签相比，分数标签能更细致地控制图像质量。它们在该模型中对引导输出质量有更强的影响。可用的分数标签有：

high score
great score
good score
average score
bad score
low score


使用 `"high score, great score"` 分数标签且负提示词为空的示例图像。	使用 `"bad score, low score"` 分数标签且负提示词为空的示例图像。

时间标签

时间标签允许你根据特定时间段或年份影响艺术风格。这对于生成具有特定时代艺术特征的图像非常有用。支持的年份标签有：

year 2005
year {n}
year 2025


带有 `"year 2007"` 时间标签的初音未来示例图像。	带有 `"year 2023"` 时间标签的初音未来示例图像。

评级标签

评级标签有助于控制生成图像的内容安全级别。应负责任地使用这些标签，并遵守适用的法律和平台政策。支持的评级有：

safe
sensitive
nsfw
explicit

🔧 技术细节

模型使用最先进的硬件和优化的超参数进行训练，以确保输出的最高质量。以下是训练过程中使用的详细技术规格和参数：

参数	值
硬件	7 x H100 80GB SXM5
图像数量	8,401,464
UNet 学习率	2.5e-6
文本编码器学习率	1.25e-6
调度器	Constant With Warmup
热身步数	5%
批量大小	32
梯度累积步数	2
训练分辨率	1024x1024
优化器	Adafactor
输入扰动噪声	0.1
无偏估计损失	启用
混合精度	fp16

📄 许可证

本模型采用了 Stability AI 原始的 CreativeML Open RAIL++-M 许可证，未做任何修改或添加额外限制。许可证条款与原始 SDXL 许可证完全一致，包括：

✅ 允许：商业使用、修改、分发、私人使用
❌ 禁止：非法活动、生成有害内容、歧视、剥削
⚠️ 要求：包含许可证副本、说明更改、保留通知
📝 保证：“按原样”提供，不提供保证

请参考原始 SDXL 许可证获取完整和权威的条款和条件。

致谢

这个长期项目的成功离不开 Stability AI、Novel AI 和 Waifu Diffusion Team 的开创性工作、创新贡献和全面文档。我们特别感谢 Main 提供的启动资金，使我们能够在 V2 版本之后继续推进项目。对于这个版本，我们衷心感谢社区中每个人的持续支持，特别是：