๐ Hermitage XL
Hermitage XL is a high - resolution, latent text - to - image diffusion model. It can generate and modify anime - themed images based on text prompts, offering high - quality anime - styled image output.
๐ Quick Start
Hermitage XL can be used in multiple ways:
To use the model, first download Hermitage XL
here. The model is in .safetensors
format.
Prerequisites
- You need to use Danbooru - style tag as prompt instead of natural language, otherwise you will get realistic result instead of anime.
- You can use any generic negative prompt or use the following suggested negative prompt to guide the model towards high aesthetic generations:
lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
- And, the following should also be prepended to prompts to get high aesthetic results:
masterpiece, best quality, illustration, beautiful detailed, finely detailed, dramatic light, intricate details
Installation
Make sure to upgrade diffusers to >= 0.18.2:
pip install diffusers --upgrade
In addition make sure to install transformers
, safetensors
, accelerate
as well as the invisible watermark:
pip install invisible_watermark transformers accelerate safetensors
Usage Example
Running the pipeline (if you don't swap the scheduler it will run with the default EulerDiscreteScheduler; in this example we are swapping it to EulerAncestralDiscreteScheduler):
import torch
from torch import autocast
from diffusers.models import AutoencoderKL
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler
model = "Linaqruf/hermitage-xl"
vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae")
pipe = StableDiffusionXLPipeline.from_pretrained(
model,
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
vae=vae
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')
prompt = "masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"
image = pipe(
prompt,
negative_prompt=negative_prompt,
width=1024,
height=1024,
guidance_scale=12,
target_size=(1024,1024),
original_size=(4096,4096),
num_inference_steps=50
).images[0]
image.save("anime_girl.png")
โจ Features
- High - Resolution Images: The model was trained with 1024x1024 resolution. It is trained using NovelAI Aspect Ratio Bucketing Tool so that it can be trained at non - square resolutions.
- Anime - styled Generation: Based on given text prompts, the model can create high - quality anime - styled images.
- Fine - Tuned Diffusion Process: The model utilizes a fine - tuned diffusion process to ensure high - quality and unique image output.
๐ Documentation
Model Details
Limitation
- This model inherits Stable Diffusion XL 1.0 limitation.
- This model is overfitted and cannot follow prompts well, because it's fine - tuned for 5000 steps with small - scale datasets.
- It's only a preview model to find good hyperparameter and training config for Stable Diffusion XL 1.0.
Example
Here is some cherry - picked samples and comparison between available models:
๐ License
This model is under the CreativeML Open RAIL++ - M License.