Evt_V4-preview Open-source Animation Style Model - Fine-tuned with a Larger Dataset, Excellent High-similarity Results

Evt V4 Preview

Developed by haor

The EVT series is an experimental project focused on fine-tuning anime-style models with large-scale datasets. Evt_V4 utilizes an even larger dataset than previous versions, achieving an 85% cosine similarity with ACertainty.

Image Generation EnglishOpen Source License:Openrail #Anime Style Optimization #High Similarity Fine-tuning #Large-scale Dataset Training

Downloads 137

Release Time : 1/9/2023

Model Overview

Evt_V4 is a text-to-image generation model based on Stable Diffusion technology, specifically optimized for anime-style imagery.

Model Features

Anime Style Optimization

Specially fine-tuned for anime-style imagery using a large-scale dataset

High Similarity

Achieves 85% cosine similarity with the ACertainty model

Large-scale Training

Trained for 10 epochs using approximately 550,000 anime-style images

Model Capabilities

Text-to-Image Generation

Anime-style Image Generation

High-quality Image Rendering

Use Cases

Anime Creation

Anime Character Design

Generate anime character images in various styles

Examples showcase high-quality generation of characters like 1girl and Madoka Kaname

Scene Creation

Generate anime-style scene images

Examples showcase anime-style scenes featuring elements like fields and fruits

🚀 Evt_V4-preview

The EVT series is an experimental project for fine - tuning an animation - style model with large datasets. Evt_V4 uses a larger dataset than its predecessors, and its cosine similarity with ACertainty reaches 85%. It may perform differently from other models. Enjoy exploring it!

🚀 Quick Start

✨ Features

This model belongs to the Stable Diffusion family and can generate text - to - image results.
It has a high cosine similarity with the ACertainty model.
It can be exported to multiple formats such as ONNX, MPS, and FLAX/JAX.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

This model can be used just like any other Stable Diffusion model. For more information, please have a look at the Stable Diffusion. You can also export the model to ONNX, MPS and/or FLAX/JAX.

from diffusers import StableDiffusionPipeline
import torch

model_id = "haor/Evt_V4-preview"
branch_name= "main"

pipe = StableDiffusionPipeline.from_pretrained(model_id, revision=branch_name, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "1girl"
image = pipe(prompt).images[0]

image.save("./1girl.png")

Advanced Usage

Here are some example prompts and their corresponding generated images:

Prompt1: Prompt1 Prompt1

1girl in black serafuku standing in a field solo, food, fruit, lemon, bubble, planet, moon, orange \(fruit\), lemon slice, leaf, fish, orange slice, by (tabi:1.25), spot color, looking at viewer, closeup cowboy shot
Negative prompt: (bad:0.81), (comic:0.81), (cropped:0.81), (error:0.81), (extra:0.81), (low:0.81), (lowres:0.81), (speech:0.81), (worst:0.81), (blush:0.9), 2koma, 3koma, 4koma, collage, lipstick
Steps: 20, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 2285895007, Size: 512x1152, Denoising strength: 0.7, Clip skip: 2

Prompt2: Prompt2 Prompt2

{Masterpiece, Kaname_Madoka, tall and long double tails, well rooted hair, (pink hair), pink eyes, crossed bangs, ojousama, jk, thigh bandages, wrist cuffs, (pink bow: 1.2)}, plain color, sketch, masterpiece, high detail, masterpiece portrait, best quality, ray tracing, {:<, look at the edge}
Negative prompt: ((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)),extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((bad proportions))), ((extra limbs)), (((deformed))), (((disfigured))), cloned face, gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), too many fingers, (((long neck))), (((low quality))), normal quality, blurry, bad feet, text font ui, ((((worst quality)))), anatomical nonsense, (((bad shadow))), unnatural body, liquid body, 3D, 3D game, 3D game scene, 3D character, bad hairs, poorly drawn hairs, fused hairs, big muscles, bad face, extra eyes, furry, pony, mosaic, disappearing calf, disappearing legs, extra digit, fewer digit, fused digit, missing digit, fused feet, poorly drawn eyes, big face, long face, bad eyes, thick lips, obesity, strong girl, beardï¼ŒExcess legs
Steps: 20, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 2468255263, Size: 512x1152, Denoising strength: 0.7, Clip skip: 2

📚 Documentation

🔧 Technical Details

Base Model: ACertainty
Training Data: Trained for 10 epochs using around 550k anime - style images (pixiv and yandere).
Resolution: 512
UCG: 0.1
Use arb: True
Trainer: [Mikubill/naifu - diffusion](https://github.com/Mikubill/naifu - diffusion)

arb:
  enabled: true
  debug: false
  base_res: [512, 512]
  max_size: [768, 512]
  divisible: 64
  max_ar_error: 4
  min_dim: 256
  dim_limit: 1024

scheduler:
  name: diffusers.DDIMScheduler
  params:
      beta_end: 0.012
      beta_schedule: "scaled_linear"
      beta_start: 0.00085
      clip_sample: false
      num_train_timesteps: 1000
      set_alpha_to_one: false
      steps_offset: 1
      trained_betas: null

optimizer:
  name: bitsandbytes.optim.AdamW8bit
  params:
    lr: 2e-6
    weight_decay: 5e-2
    eps: 1e-7

lr_scheduler:
  name: torch.optim.lr_scheduler.CosineAnnealingWarmRestarts
  warmup: 
    enabled: true
    init_lr: 2e-8
    num_warmup: 50
    strategy: "cos"  
  params:
    T_0: 5
    T_mult: 1
    eta_min: 6e-7
    last_epoch: -1

It spent about 300 V100 GPU hours for training.

📄 License

This model is open access and available to all, with a CreativeML OpenRAIL - M license further specifying rights and usage. The CreativeML OpenRAIL License specifies:

You can't use the model to deliberately produce nor share illegal or harmful outputs or content.
The authors claim no rights on the outputs you generate. You are free to use them and are accountable for their use, which must not go against the provisions set in the license.
You may re - distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL - M to all your users (please read the license entirely and carefully). [Please read the full license here](https://huggingface.co/spaces/CompVis/stable - diffusion - license)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご