Wan - Flat - Color - 1.3b - v2 Open - Source Style Model - Free Generation of Solid - Color Flat - Painted Images without Visible Line Drawings

Wan Flat Color 1.3b V2

Developed by motimalu

A style model specifically trained for images with no visible line art, flat coloring, and minimal depth expression

Image Generation Open Source License:Apache-2.0 #Flat Coloring without Line Art #Anime Stylization #Virtual YouTuber Generation

Downloads 49

Release Time : 3/13/2025

Model Overview

This model is trained using LoRA technology, capable of generating images with no visible line art and flat coloring style, particularly suitable for anime-style character design.

Model Features

Flat Coloring Style

Generates images with no visible line art and flat coloring style

LoRA Adaptation

Uses LoRA technology for fine-tuning, achieving specific styles while preserving the base model's capabilities

High-Quality Output

Capable of generating high-quality cinematic visuals, especially suitable for anime-style character design

Model Capabilities

Text-to-Image Generation

Stylized Image Generation

Anime Character Design

Use Cases

Digital Art Creation

Virtual YouTuber Avatar Design

Generates anime-style avatars for virtual YouTubers

As shown in the example characters like Hoshimachi Suisei and Sakura Miku

Anime Scene Creation

Creates anime scenes with specific styles

Such as starry sky backgrounds or scenes under cherry blossom trees

🚀 Flat Color - Style

This project focuses on a flat - color style for text - to - image and text - to - video generation, trained on images without visible lineart, flat colors, and little to no indication of depth.

🚀 Quick Start

Trigger Words

You should use flat color and no lineart to trigger the image generation.

Loading the Model

Loading the LoRA with LoraLoaderModelOnly node and using the fp16 1.3B wan2.1_t2v_1.3B_fp16.safetensors.

✨ Features

Style Characteristics: Trained on images featuring flat colors, no visible lineart, and minimal depth indication.
Multiple Output Examples: Demonstrated with different text inputs and corresponding outputs, such as images of different characters in various scenarios.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Text - to - Image/Video Generation Example

The following are examples of text inputs and their corresponding outputs:

Example 1

Text Input:

flat color, no lineart, blending, negative space, artist:[john kafka|ponsuke kaikai|hara id 21|yoneyama mai|fuzichoco],  1girl, hoshimachi suisei, virtual youtuber, blue hair, side ponytail, cowboy shot, black shirt, star print, off shoulder, outdoors, starry sky, wariza, looking up, half - closed eyes, black sky,  live2d animation, upper body, high quality cinematic video of a woman sitting under the starry night sky. The Camera is steady, This is a cowboy shot. The animation is smooth and fluid.

Negative Prompt:

bad quality video, bright color tone, overexposed, static, blurred details, subtitles, style, works, paintings, pictures, still, overall grayish, worst quality, low quality, JPEG compression artifacts, ugly,残缺的, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, fused fingers, still pictures, cluttered background, three legs, many people in the background, walking backwards

Output: Link

Example 2

Text Input:

flat color, no lineart, blending, negative space, artist:[john kafka|ponsuke kaikai|hara id 21|yoneyama mai|fuzichoco],  1girl, sakura miko, pink hair, cowboy shot, white shirt, floral print, off shoulder, outdoors, cherry blossom, tree shade, wariza, looking up, falling petals, half - closed eyes, white sky, clouds,  live2d animation, upper body, high quality cinematic video of a woman sitting under a sakura tree. Dreamy and lonely, the camera close - ups on the face of the woman as she turns towards the viewer. The Camera is steady, This is a cowboy shot. The animation is smooth and fluid.

Negative Prompt:

bad quality video, bright color tone, overexposed, static, blurred details, subtitles, style, works, paintings, pictures, still, overall grayish, worst quality, low quality, JPEG compression artifacts, ugly,残缺的, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, fused fingers, still pictures, cluttered background, three legs, many people in the background, walking backwards

Output: Link

📚 Documentation

Model Description

Flat Color - Style Trained on images without visible lineart, flat colors, and little to no indication of depth. Reprinted from CivitAI: Link Text to Video previews generated with [ComfyUI_examples/wan/#text - to - video](https://comfyanonymous.github.io/ComfyUI_examples/wan/#text - to - video)

Download Model

Weights for this model are available in Safetensors format. [Download](motimalu/wan - flat - color - 1.3b - v2/tree/main) them in the Files & versions tab.

Training Config

dataset.toml

# Resolution settings.
resolutions = [512]

# Aspect ratio bucketing settings
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 7

# Frame buckets (1 is for images)
frame_buckets = [1]

[[directory]] # IMAGES
# Path to the directory containing images and their corresponding caption files.
path = '/mnt/d/huanvideo/training_data/images'
num_repeats = 5
resolutions = [720]
frame_buckets = [1] # Use 1 frame for images.

[[directory]] # VIDEOS
# Path to the directory containing videos and their corresponding caption files.
path = '/mnt/d/huanvideo/training_data/videos'
num_repeats = 5
resolutions = [512] # Set video resolution to 256 (e.g., 244p).
frame_buckets = [6, 28, 31, 32, 36, 42, 43, 48, 50, 53]

config.toml

# Dataset config file.
output_dir = '/mnt/d/wan/training_output'
dataset = 'dataset.toml'

# Training settings
epochs = 50
micro_batch_size_per_gpu = 1
pipeline_stages = 1
gradient_accumulation_steps = 4
gradient_clipping = 1.0
warmup_steps = 100

# eval settings
eval_every_n_epochs = 5
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1

# misc settings
save_every_n_epochs = 5
checkpoint_every_n_minutes = 30
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 1
steps_per_print = 1
video_clip_mode = 'single_middle'

[model]
type = 'wan'
ckpt_path = '../Wan2.1-T2V-1.3B'
dtype = 'bfloat16'
# You can use fp8 for the transformer when training LoRA.
transformer_dtype = 'float8'
timestep_sample_method = 'logit_normal'

[adapter]
type = 'lora'
rank = 32
dtype = 'bfloat16'

[optimizer]
type = 'adamw_optimi'
lr = 5e-5
betas = [0.9, 0.99]
weight_decay = 0.02
eps = 1e-8

📄 License

The model is licensed under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご