DanTagGen-delta-rev2 Open-source Image Tag Generator - Automatically Generate Danbooru Tags Based on LLaMA

Home

Dantaggen Delta Rev2

Developed by KBlueLeaf

DanTagGen is a Danbooru tag generator based on the LLaMA architecture, used for automatically generating image tags.

Large Language Model

Transformers

#Danbooru tag generation #LLaMA architecture #Anime image annotation

Downloads 83.24k

Release Time : 4/24/2024

Model Overview

DanTagGen is a tag generation model specifically designed for Danbooru - style images, which can automatically generate relevant tags based on the input image features.

Model Features

Multi - version support

Three versions, alpha, beta, and delta, are provided, which are trained on different - scale datasets to meet different needs.

High - quality tag generation

Trained on a 7.2 - million high - quality dataset, the generated tags are diverse and accurate.

LLaMA - compatible architecture

Based on the LLaMA architecture, it is compatible with various LLaMA inference interfaces.

Quality tag support

The delta version introduces quality tags, enabling the generation of more detailed tags.

Model Capabilities

Image tag generation

Multi - tag prediction

Quality assessment

Style recognition

Use Cases

Image annotation

Anime image tag generation

Automatically generate Danbooru - style tags for anime - style images.

Generate a tag set containing information such as character features, style, and quality.

Content creation assistance

AI painting prompt generation

Generate detailed prompts for AI painting tools.

Provide structured and detailed prompts to improve the quality of AI paintings.

🚀 DanTagGen - delta (rev2)

DanTagGen (Danbooru Tag Generator) is inspired by p1atdev's dart project. However, it features a different architecture, dataset, format, and training strategy.

🚀 Quick Start

This model is designed for text generation. You can use it with the transformers library. For a quick test, you can use the provided widget on the Hugging Face page.

✨ Features

Difference between versions

alpha: Pretrained on a 2M dataset with a smaller batch size. It has limited capabilities.
beta: Pretrained on a 5.3M dataset with a larger batch size. It is more stable and performs better even with limited input information.
delta: Pretrained on a 7.2M dataset with a larger batch size. It shows a slight underfitting but offers better diversity. A quality tag has been introduced.
- rev2: Resumed from the delta version, using the same dataset and trained for 2 more epochs.

Model arch

This version of DTG is trained from scratch with a 400M parameter LLaMA architecture (I personally refer to it as NanoLLaMA). Since it is based on the LLaMA architecture, it can theoretically be used with any LLaMA inference interface.

This repository also provides a converted FP16 gguf model and quantized 8-bit/6-bit gguf models. It is recommended to use llama.cpp or llama-cpp-python to run this model for optimal speed.

Format

prompt = f"""
quality: {quality or '<|empty|>'}
rating: {rating or '<|empty|>'}
artist: {artist.strip() or '<|empty|>'}
characters: {characters.strip() or '<|empty|>'}
copyrights: {copyrights.strip() or '<|empty|>'}
aspect ratio: {f"{aspect_ratio:.1f}" or '<|empty|>'}
target: {'<|' + target + '|>' if target else '<|long|>'}
general: {", ".join(special_tags)}, {general.strip().strip(",")}<|input_end|>
"""

Basic Usage

quality: masterpiece
rating: safe
artist: <|empty|>
characters: <|empty|>
copyrights: <|empty|>
aspect ratio: 1.0
target: <|short|>
general: 1girl, solo, dragon girl, dragon horns, dragon tail<|input_end|>

Advanced Usage

After inputting the above prompt, you may get an output like this:

rating: safe
artist: <|empty|>
characters: <|empty|>
copyrights: <|empty|>
aspect ratio: 1.0
target: <|short|>
general: 1girl, solo, dragon girl, dragon horns, dragon tail<|input_end|>open mouth, red eyes, long hair, pointy ears, tail, black hair, chinese clothes, simple background, dragon, hair between eyes, horns, china dress, dress, looking at viewer, breasts

Dataset and Training

I used the trainer I implemented in HakuPhi to conduct the training. The model has been trained for a total of 12 epochs on a 7.2M dataset, and it has seen roughly 10 - 15B tokens.

The dataset is exported by HakuBooru using my Danbooru SQLite database. The data is filtered based on the percentile of the fav_count for each rating (2M = top 25%, 5.3M = top 75%).

📚 Documentation

Utilities

HF space: DTG Demo
Demo for DTG + Kohaku XL Epsilon: This Cute Dragon Girl Doesn't Exist
SD-WebUI Extension: z-a1111-sd-webui-dtg
ComfyUI Node: a1111-sd-webui-dtg_comfyui

📄 License

This project is licensed under the CC BY-SA 4.0 license.

Additional Information

Property	Details
Model Type	Text Generation
Training Data	Exported by HakuBooru from Danbooru SQLite database, filtered based on fav_count percentile. 2M (top 25%), 5.3M (top 75%), 7.2M (used for delta and rev2)
Library Name	transformers
Pipeline Tag	text-generation
Tags	not-for-all-audiences, art

⚠️ Important Note

This model is tagged with not-for-all-audiences. Please use it responsibly.

💡 Usage Tip

When using the model, adjust the input parameters according to your needs to get the best results.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご