# anime-painter Open-source Model - Generate Extremely High-quality Images from Anime Sketches, No Restriction on Line Type and Width

Anime Painter

Developed by xinsir

A model based on controlnet-scribble-sdxl-1.0, capable of generating extremely high-quality images from anime sketches, supporting any type and width of lines.

Image Generation Open Source License:Apache-2.0 #Anime Sketch Generation #High-Quality Images #Scribble to Illustration

Downloads 1,443

Release Time : 5/12/2024

Model Overview

This model allows users to generate high-quality anime images from simple sketches, making it particularly suitable for those without drawing skills to quickly create anime illustrations.

Model Features

High-Quality Anime Image Generation

Capable of generating extremely high-quality anime images from simple or even blurry sketches.

Supports Various Line Types

Supports any type and width of lines, including hand-drawn sketches.

Strong Prompt Following Ability

Can generate images that meet requirements based on Danbooru tags and natural language descriptions.

Low Image Distortion Rate

Significantly reduces the probability of generating abnormal human structures.

Model Capabilities

Anime Image Generation

Sketch to Image

Text-to-Image Conversion

High-Quality Image Rendering

Use Cases

Anime Creation

Character Design

Generate anime character designs from simple sketches and tags.

Produces visually striking anime character images.

Scene Creation

Generate anime scenes from sketches and detailed descriptions.

Produces anime scenes that match the sketch outlines and are semantically reasonable.

Art Creation Assistance

Non-Professional User Creation

Helps those without drawing skills quickly create anime illustrations.

Even with simple or blurry sketches, it can generate beautiful images.

🚀 Controlnet-scribble-sdxl-1.0-anime

This model enables anyone, even those with no drawing skills, to become an anime painter by generating high - quality anime images from simple sketches.

An image of a sunset

🚀 Quick Start

This is a controlnet - scribble - sdxl - 1.0 model capable of generating very high - quality images from an anime sketch. It can support any type and width of lines. As shown in the examples, the sketch can be extremely simple and unclear. Even if you're a beginner or have no drawing experience, you can simply doodle and add some danbooru tags to generate a beautiful anime illustration.

In our evaluation, the model achieves state - of - the - art performance, significantly outperforming the original SDXL1.5 Scribble trained by lvming Zhang [https://github.com/lllyasviel/ControlNet]. The model has been trained with complex techniques and high - quality datasets. Besides the aesthetic score, the prompt - following ability (proposed by OpenAI in the paper (https://cdn.openai.com/papers/dall - e - 3.pdf)) and the image deformity rate (the probability that the generated images have abnormal human structures) have also improved greatly.

The founder of Midjourney said that Midjourney can help non - artists draw, thus expanding the boundaries of their imagination. We share a similar vision: we hope to enable those who know little about anime or cartoons to create their own characters in a simple way, express themselves, and unleash their creativity. AIGC will reshape the animation industry. The model we released can generate anime images with an aesthetic score higher than almost all popular anime websites on average. So, just enjoy it!

If you want to generate particularly visually appealing images, you should use danbooru tags along with natural language. Since there are far fewer anime images than real - world images, you can't just use natural language input like "a girl walk in the street" due to limited information. Instead, you should describe it in more detail, such as "a girl, blue shirt, white hair, black eye, smile, pink flower, cherry blossoms ..."

In summary, you should first use tags to describe what's in the image (danbooru tag) and then describe what's happening in the image (natural language). The more detailed, the better. If your description is not clear enough, the generated image will be somewhat random. Anyway, the model can understand your drawing semantically to some extent and give you a decent result. To our knowledge, we haven't seen other SDXL - Scribble models in the open - source community, so we might be the first.

⚠️ Important Note

To generate anime images with our model, you need to choose an anime SDXL base model from Hugging Face [https://huggingface.co/models?pipeline_tag=text - to - image&sort=trending&search=blue] or Civitai [https://civitai.com/search/models?baseModel=SDXL%201.0&sortBy=models_v8&query=anime]. The showcases listed here are based on CounterfeitXL [https://huggingface.co/gsdf/CounterfeitXL/tree/main]. Different base models have different image styles, and you can also use Bluepencil or other models. The model was trained with a large number of anime images, including almost all the anime images we could find on the Internet. We carefully filtered them to preserve high - visual - quality images comparable to those on Nijijourney or popular anime illustration platforms. We trained it with ControlNet - SDXL - 1.0 [https://arxiv.org/abs/2302.05543], and the technical details won't be disclosed in this report.

✨ Features

High - Quality Image Generation: Generate high - quality anime images from simple sketches.
Flexible Sketch Support: Support any type and width of lines in sketches.
Improved Performance: Achieve state - of - the - art performance in terms of aesthetic score, prompt - following ability, and image deformity rate.
User - Friendly Creation: Enable non - artists to create their own anime characters easily.

📦 Installation

No specific installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
from diffusers import DDIMScheduler, EulerAncestralDiscreteScheduler
from controlnet_aux import PidiNetDetector, HEDdetector
from diffusers.utils import load_image
from huggingface_hub import HfApi
from pathlib import Path
from PIL import Image
import torch
import numpy as np
import cv2
import os


def nms(x, t, s):
    x = cv2.GaussianBlur(x.astype(np.float32), (0, 0), s)

    f1 = np.array([[0, 0, 0], [1, 1, 1], [0, 0, 0]], dtype=np.uint8)
    f2 = np.array([[0, 1, 0], [0, 1, 0], [0, 1, 0]], dtype=np.uint8)
    f3 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.uint8)
    f4 = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0]], dtype=np.uint8)

    y = np.zeros_like(x)

    for f in [f1, f2, f3, f4]:
        np.putmask(y, cv2.dilate(x, kernel=f) == x, x)

    z = np.zeros_like(y, dtype=np.uint8)
    z[y > t] = 255
    return z


controlnet_conditioning_scale = 1.0  
prompt = "your prompt, the longer the better, you can describe it as detail as possible"
negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'


eulera_scheduler = EulerAncestralDiscreteScheduler.from_pretrained("gsdf/CounterfeitXL", subfolder="scheduler")


controlnet = ControlNetModel.from_pretrained(
    "xinsir/anime-painter",
    torch_dtype=torch.float16
)

# when test with other base model, you need to change the vae also.
vae = AutoencoderKL.from_pretrained("gsdf/CounterfeitXL", subfolder="vae", torch_dtype=torch.float16)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "gsdf/CounterfeitXL",
    controlnet=controlnet,
    vae=vae,
    safety_checker=None,
    torch_dtype=torch.float16,
    scheduler=eulera_scheduler,
)

# you can use either hed to generate a fake scribble given an image or a sketch image totally draw by yourself
import random
if random.random() > 0.5:
  # Method 1 
  # if you use hed, you should provide an image, the image can be real or anime, you extract its hed lines and use it as the scribbles
  # The detail about hed detect you can refer to https://github.com/lllyasviel/ControlNet/blob/main/gradio_fake_scribble2image.py
  # Below is a example using diffusers HED detector

  image_path = Image.open("your image path, the image can be real or anime, HED detector will extract its edge boundery")
  processor = HEDdetector.from_pretrained('lllyasviel/Annotators')
  controlnet_img = processor(image_path, scribble=False)
  controlnet_img.save("a hed detect path for an image")

  # following is some processing to simulate human sketch draw, different threshold can generate different width of lines
  controlnet_img = np.array(controlnet_img)
  controlnet_img = nms(controlnet_img, 127, 3)
  controlnet_img = cv2.GaussianBlur(controlnet_img, (0, 0), 3)

  # higher threshold, thiner line
  random_val = int(round(random.uniform(0.01, 0.10), 2) * 255)
  controlnet_img[controlnet_img > random_val] = 255
  controlnet_img[controlnet_img < 255] = 0
  controlnet_img = Image.fromarray(controlnet_img)

else:
  # Method 2
  # if you use a sketch image total draw by yourself
  control_path = "the sketch image you draw with some tools, like drawing board, the path you save it"
  controlnet_img = Image.open(control_path) # Note that the image must be black-white(0 or 255), like the examples we list

# must resize to 1024*1024 or same resolution bucket to get the best performance
width, height  = controlnet_img.size
ratio = np.sqrt(1024. * 1024. / (width * height))
new_width, new_height = int(width * ratio), int(height * ratio)
controlnet_img = controlnet_img.resize((new_width, new_height))

images = pipe(
    prompt,
    negative_prompt=negative_prompt,
    image=controlnet_img,
    controlnet_conditioning_scale=controlnet_conditioning_scale,
    width=new_width,
    height=new_height,
    num_inference_steps=30,
    ).images

images[0].save(f"your image save path, png format is usually better than jpg or webp in terms of image quality but got much bigger")

📚 Documentation

Examples Display

Prompt 1: 1girl, breasts, solo, long hair, pointy ears, red eyes, horns, navel, sitting, cleavage, toeless legwear, hair ornament, smoking pipe, oni horns, thighhighs, detached sleeves, looking at viewer, smile, large breasts, holding smoking pipe, wide sleeves, bare shoulders, flower, barefoot, holding, nail polish, black thighhighs, jewelry, hair flower, oni, japanese clothes, fire, kiseru, very long hair, ponytail, black hair, long sleeves, bangs, red nails, closed mouth, toenails, navel cutout, cherry blossoms, water, red dress, fingernails
Prompt 2: 1girl, solo, blonde hair, weapon, sword, hair ornament, hair flower, flower, dress, holding weapon, holding sword, holding, gloves, breasts, full body, black dress, thighhighs, looking at viewer, boots, bare shoulders, bangs, medium breasts, standing, black gloves, short hair with long locks, thigh boots, sleeveless dress, elbow gloves, sidelocks, black background, black footwear, yellow eyes, sleeveless
Prompt 3: 1girl, solo, holding, white gloves, smile, purple eyes, gloves, closed mouth, balloon, holding microphone, microphone, blue flower, long hair, puffy sleeves, purple flower, blush, puffy short sleeves, short sleeves, bangs, dress, shoes, very long hair, standing, pleated dress, white background, flower, full body, blue footwear, one side up, arm up, hair bun, brown hair, food, mini crown, crown, looking at viewer, hair between eyes, heart balloon, heart, tilted headwear, single side bun, hand up
Prompt 4: tiger, 1boy, male focus, blue eyes, braid, animal ears, tiger ears, 2022, solo, smile, chinese zodiac, year of the tiger, looking at viewer, hair over one eye, weapon, holding, white tiger, grin, grey hair, polearm, arm up, white hair, animal, holding weapon, arm behind head, multicolored hair, holding polearm
Prompt 5: 1boy, male child, glasses, male focus, shorts, solo, closed eyes, bow, bowtie, smile, open mouth, red bow, jacket, red bowtie, white background, shirt, happy, black shorts, child, simple background, long sleeves, ^_^, short hair, white shirt, brown hair, black - framed eyewear, :d, facing viewer, black hair
Prompt 6: solo, 1girl, swimsuit, blue eyes, plaid headwear, bikini, blue hair, virtual youtuber, side ponytail, looking at viewer, navel, grey bikini, ribbon, long hair, parted lips, blue nails, hat, breasts, plaid, hair ribbon, water, arm up, bracelet, star (symbol), cowboy shot, stomach, thigh strap, hair between eyes, beach, small breasts, jewelry, wet, bangs, plaid bikini, nail polish, grey headwear, blue ribbon, adapted costume, choker, ocean, bare shoulders, outdoors, beret
Prompt 7: fruit, food, no humans, food focus, cherry, simple background, english text, strawberry, signature, border, artist name, cream
Prompt 8: 1girl, solo, ball, swimsuit, bikini, mole, beachball, white bikini, breasts, hairclip, navel, looking at viewer, hair ornament, chromatic aberration, holding, holding ball, pool, cleavage, water, collarbone, mole on breast, blush, bangs, parted lips, bare shoulders, mole on thigh, bare arms, smile, large breasts, blonde hair, halterneck, hair between eyes, stomach

Evaluation Data

The test data is randomly sampled from popular wallpaper anime images (Pixiv, Nijijourney, etc.). The purpose of the project is to enable everyone to draw an anime illustration. We selected 100 images, generated text with Waifu - Tagger [https://huggingface.co/spaces/SmilingWolf/wd - tagger], and generated 4 images per prompt, resulting in a total of 400 generated images. The image resolution should be 1024 * 1024 or the same bucket for SDXL and 512 * 768 or the same bucket for SD1.5. We then resized the SDXL - generated images to 512 * 768 or the same bucket for a fair comparison.

We calculated the Laion Aesthetic Score to measure the beauty and the Perceptual Similarity to measure the control ability. We found that the image quality is highly consistent with the metric values. We compared our method with other SOTA Hugging Face models and listed the results below. Our model has the highest aesthetic score and can generate visually appealing images if properly prompted.

Quantitative Result

Property	Details
Model Type	ControlNet_SDXL
Finetuned from model	stabilityai/stable - diffusion - xl - base - 1.0
License	apache - 2.0

metric	xinsir/anime - painter	lllyasviel/control_v11p_sd15_scribble
laion_aesthetic	5.95	5.86
pe	(The original README seems incomplete here)

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご