DanTagGen-beta开源图像标签生成器 - 免费生成Danbooru风格图像标签

首页

Dantaggen Beta

由 KBlueLeaf 开发

DanTagGen（Danbooru标签生成器）是一个基于NanoLLaMA架构的文本生成模型，专门用于生成Danbooru风格的图像标签。

文本生成支持多种语言开源协议:Openrail #Danbooru标签生成 #动漫特征扩展 #NanoLLaMA架构

下载量 9,374

发布时间 : 3/18/2024

模型简介

DanTagGen是一个文本生成模型，灵感源自p1atdev的dart项目，但在架构、数据集、格式和训练策略上均有不同。它能够根据少量输入信息生成丰富的Danbooru风格标签，用于图像生成和标注。

模型特点

高质量标签生成

能够根据少量输入信息生成丰富且精确的Danbooru风格标签，显著提升图像生成的细节和构图。

多版本支持

提供Alpha和Beta两个版本，Beta版基于更大的数据集预训练，生成效果更优。

灵活输入格式

支持多种输入参数，如评分、画师、角色、版权、宽高比等，可根据需求定制生成内容。

量化模型支持

提供FP16/8bit/6bit量化模型，推荐使用llama.cpp运行，适应不同硬件需求。

模型能力

文本生成

标签扩展

图像标注辅助

使用案例

图像生成

赛马娘Vivlos图像生成

根据基础提示词生成包含更多细节的标签，如蓝色比基尼、缎带等，提升图像生成的细节和角色特征精确度。

角色特征精确，细节丰富构图更佳。

赛马娘Daring Tact图像生成

根据基础提示词生成包含更多细节的标签，如夹克、食物等，优化细节与构图。

细节与构图显著优化。

艺术创作

非全年龄向艺术创作

生成适合非全年龄向艺术创作的标签，丰富图像内容和细节。

生成内容更符合艺术创作需求。

🚀 DanTagGen - 测试版

DanTagGen（Danbooru标签生成器）的灵感源自p1atdev的Dart项目。但它采用了不同的架构、数据集、格式和训练策略。

🚀 快速开始

DanTagGen是一个文本生成模型，旨在为Danbooru风格的图像生成更丰富的标签。它有不同的版本，每个版本在性能和能力上有所差异。

✨ 主要特性

不同版本，各有优势：
- Alpha版：在200万数据集上进行预训练，批量大小较小，能力有限。
- Beta版：在530万数据集上进行预训练，批量大小更大，性能更稳定，在信息较少的情况下表现更佳。
可用于多种推理接口：基于4亿参数的LLaMA架构从头开始训练，理论上可用于任何LLaMA推理接口。
提供多种模型格式：提供转换后的FP16 gguf模型和量化的8位/6位gguf模型。

📦 安装指南

文档未提及具体安装步骤，暂不提供。

💻 使用示例

基础用法

以下是一个基本的提示示例：

1girl,
vivlos \(umamusume\), umamusume, 
kxl-delta-style1,
swimsuit,
masterpiece, newest, absurdres, sensitive

示例1：Vivlos

	无DTG	DTG-Alpha	DTG-Beta
提示词	基础提示词	基础提示词 + "mole under eye, tail, twintails, open mouth, single ear cover, horse ears, breasts, looking at viewer, visor cap, streaked hair, long hair, horse tail, hair between eyes, cowboy shot, blue nails, purple eyes, covered navel, horse girl, competition swimsuit, blush, multicolored hair, collarbone, two-tone swimsuit, animal ears, mole, white hair, ear covers, smile, ear ornament, swimsuit, solo, blue eyes, brown hair, one-piece swimsuit, white headwear, medium breasts, white one-piece swimsuit, bare shoulders,"	基础提示词 + "blue bikini, tail, twintails, single ear cover, horse ears, striped clothes, ear piercing, cleavage, breasts, blue ribbon, looking at viewer, ribbon, streaked hair, long hair, horse tail, hair between eyes, :3, purple eyes, horse girl, blush, multicolored hair, hair ribbon, collarbone, bikini skirt, piercing, animal ears, striped bikini, sitting, white hair, ear covers, :d, smile, swimsuit, solo, brown hair, ocean, white headwear, medium breasts, bikini,"
结果图像
性能	甚至无法生成Vivlos的图像	可以生成具有正确角色特征的图像，但细节不足，部分特征有误或缺失	远优于Alpha版，能提供良好的角色特征，更多细节和更好的构图

示例2：Daring Tact

基础提示词：

1girl,
daring tact \(umamusume\), umamusume, 
kxl-delta-style1,
horse girl, horse tail, horse ears, cafe, table, chair,
masterpiece, newest, absurdres, safe

	无DTG	DTG-Alpha	DTG-Beta
提示词	基础提示词	基础提示词 + "plant, necktie, tail, indoors, skirt, looking at viewer, cup, lounge chair, green theme, book, alternate costume, potted plant, hair ornament, blue jacket, blush, medium hair, black necktie, green eyes, jacket, animal ears, black hair, round eyewear, bookshelf, adjusting eyewear, ahoge, smile, solo, window, brown hair, crossed legs, glasses, closed mouth, book stack,"	基础提示词 + "jacket, sitting on table, food, tail, collar, horse racing, black hair, boots, school bag, bag, full body, blue eyes, hair ornament, animal ears, ahoge, sitting, thighhighs, blurry background, looking at viewer, school uniform, long hair, blurry, cup, window, crossed legs, alternate costume, medium breasts, breasts, calendar (object), casual, door, solo, disposable cup,"
结果图像
性能		可以生成具有更多元素和细节的图像，但与角色的连贯性不佳	远优于Alpha版，能提供更多细节和更好的构图

高级用法

以下是输入格式的代码示例：

prompt = f"""
rating: {rating or '<|empty|>'}
artist: {artist.strip() or '<|empty|>'}
characters: {characters.strip() or '<|empty|>'}
copyrights: {copyrights.strip() or '<|empty|>'}
aspect ratio: {f"{aspect_ratio:.1f}" or '<|empty|>'}
target: {'<|' + target + '|>' if target else '<|long|>'}
general: {", ".join(special_tags)}, {general.strip().strip(",")}<|input_end|>
"""

例如：

rating: safe
artist: <|empty|>
characters: <|empty|>
copyrights: <|empty|>
aspect ratio: 1.0
target: <|short|>
general: 1girl, solo, dragon girl, dragon horns, dragon tail<|input_end|>

你可能会得到类似如下的输出：

rating: safe
artist: <|empty|>
characters: <|empty|>
copyrights: <|empty|>
aspect ratio: 1.0
target: <|short|>
general: 1girl, solo, dragon girl, dragon horns, dragon tail<|input_end|>open mouth, red eyes, long hair, pointy ears, tail, black hair, chinese clothes, simple background, dragon, hair between eyes, horns, china dress, dress, looking at viewer, breasts

📚 详细文档

模型架构

此版本的DTG基于4亿参数的LLaMA架构（个人偏好称之为NanoLLaMA）从头开始训练。由于采用了LLaMA架构，理论上它可以用于任何LLaMA推理接口。

本仓库还提供了转换后的FP16 gguf模型和量化的8位/6位gguf模型。建议使用llama.cpp或llama-cpp-python来运行该模型，这样会非常快速。

数据集和训练

使用在HakuPhi中实现的训练器进行训练，在530万数据上进行10个周期的训练。该模型大约处理了60 - 120亿个标记。

数据集由HakuBooru从Danbooru的SQLite数据库中导出，使用每个评级下的收藏计数百分位数来过滤数据（200万 = 前25%，530万 = 前75%）。

工具

正在为该项目实现一个Gradio UI，其他开发者可以利用其中的API来开发不同的应用程序。还计划开发sd-webui扩展。

🔧 技术细节

文档未提供具体的技术实现细节，暂不展示。

📄 许可证

本项目采用OpenRail许可证。

属性	详情
模型类型	文本生成模型
训练数据	从Danbooru的SQLite数据库导出的数据集，使用收藏计数百分位数过滤，在530万数据上训练10个周期