Hermitage XL Open-source Image Generation Model - Free Deployment to Create High-quality Anime-style Images

Hermitage Xl

Developed by Linaqruf

Hermitage XL is a high-resolution latent text-to-image diffusion model focused on generating high-quality anime-style images.

Image Generation English#High-resolution anime generation #SDXL fine-tuned model #Danbooru style

Downloads 41

Release Time : 7/30/2023

Model Overview

This model is fine-tuned based on Stable Diffusion XL 1.0, capable of generating high-quality anime-style images from text prompts, supporting high-resolution output.

Model Features

High-resolution image generation

The model is trained at 1024x1024 resolution and supports non-square resolution output.

Anime style optimization

Fine-tuned on a curated dataset of high-quality anime-style images for more authentic anime-style generation.

Refined diffusion process

Ensures high-quality and unique image output through a finely adjusted diffusion process.

Model Capabilities

Text-to-image generation

High-resolution image generation

Anime-style image generation

Use Cases

Creative arts

Anime character design

Generate anime character images from text descriptions

High-quality, detailed anime character images

Scene creation

Generate anime-style scene images from text descriptions

Scene images with dramatic lighting and intricate details

🚀 Hermitage XL

Hermitage XL is a high - resolution, latent text - to - image diffusion model. It can generate and modify anime - themed images based on text prompts, offering high - quality anime - styled image output.

🚀 Quick Start

Hermitage XL can be used in multiple ways:

Use it with the Stable Diffusion Webui
Use it with 🧨 diffusers
Use it with the ComfyUI

To use the model, first download Hermitage XL here. The model is in .safetensors format.

Prerequisites

You need to use Danbooru - style tag as prompt instead of natural language, otherwise you will get realistic result instead of anime.
You can use any generic negative prompt or use the following suggested negative prompt to guide the model towards high aesthetic generations:

lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry

And, the following should also be prepended to prompts to get high aesthetic results:

masterpiece, best quality, illustration, beautiful detailed, finely detailed, dramatic light, intricate details

Installation

Make sure to upgrade diffusers to >= 0.18.2:

pip install diffusers --upgrade

In addition make sure to install transformers, safetensors, accelerate as well as the invisible watermark:

pip install invisible_watermark transformers accelerate safetensors

Usage Example

Running the pipeline (if you don't swap the scheduler it will run with the default EulerDiscreteScheduler; in this example we are swapping it to EulerAncestralDiscreteScheduler):

import torch
from torch import autocast
from diffusers.models import AutoencoderKL
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler

model = "Linaqruf/hermitage-xl"
vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae")

pipe = StableDiffusionXLPipeline.from_pretrained(
    model, 
    torch_dtype=torch.float16, 
    use_safetensors=True, 
    variant="fp16",
    vae=vae
    )

pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')

prompt = "masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"

image = pipe(
    prompt, 
    negative_prompt=negative_prompt, 
    width=1024,
    height=1024,
    guidance_scale=12,
    target_size=(1024,1024),
    original_size=(4096,4096),
    num_inference_steps=50
    ).images[0]

image.save("anime_girl.png")

✨ Features

High - Resolution Images: The model was trained with 1024x1024 resolution. It is trained using NovelAI Aspect Ratio Bucketing Tool so that it can be trained at non - square resolutions.
Anime - styled Generation: Based on given text prompts, the model can create high - quality anime - styled images.
Fine - Tuned Diffusion Process: The model utilizes a fine - tuned diffusion process to ensure high - quality and unique image output.

📚 Documentation

Model Details

Property	Details
Developed by	Linaqruf
Model Type	Diffusion - based text - to - image generative model
Model Description	This is a model that can be used to generate and modify anime - themed images based on text prompts.
License	CreativeML Open RAIL++ - M License
Finetuned from model	Stable Diffusion XL 1.0

Limitation

This model inherits Stable Diffusion XL 1.0 limitation.
This model is overfitted and cannot follow prompts well, because it's fine - tuned for 5000 steps with small - scale datasets.
It's only a preview model to find good hyperparameter and training config for Stable Diffusion XL 1.0.

Example

Here is some cherry - picked samples and comparison between available models:

📄 License

This model is under the CreativeML Open RAIL++ - M License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご