🚀 Flat Color - Style
This project focuses on a flat - color style for text - to - image and text - to - video generation, trained on images without visible lineart, flat colors, and little to no indication of depth.
🚀 Quick Start
Trigger Words
You should use flat color
and no lineart
to trigger the image generation.
Loading the Model
Loading the LoRA with LoraLoaderModelOnly node and using the fp16 1.3B wan2.1_t2v_1.3B_fp16.safetensors.
✨ Features
- Style Characteristics: Trained on images featuring flat colors, no visible lineart, and minimal depth indication.
- Multiple Output Examples: Demonstrated with different text inputs and corresponding outputs, such as images of different characters in various scenarios.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Text - to - Image/Video Generation Example
The following are examples of text inputs and their corresponding outputs:
Example 1
flat color, no lineart, blending, negative space, artist:[john kafka|ponsuke kaikai|hara id 21|yoneyama mai|fuzichoco], 1girl, hoshimachi suisei, virtual youtuber, blue hair, side ponytail, cowboy shot, black shirt, star print, off shoulder, outdoors, starry sky, wariza, looking up, half - closed eyes, black sky, live2d animation, upper body, high quality cinematic video of a woman sitting under the starry night sky. The Camera is steady, This is a cowboy shot. The animation is smooth and fluid.
bad quality video, bright color tone, overexposed, static, blurred details, subtitles, style, works, paintings, pictures, still, overall grayish, worst quality, low quality, JPEG compression artifacts, ugly,残缺的, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, fused fingers, still pictures, cluttered background, three legs, many people in the background, walking backwards
Example 2
flat color, no lineart, blending, negative space, artist:[john kafka|ponsuke kaikai|hara id 21|yoneyama mai|fuzichoco], 1girl, sakura miko, pink hair, cowboy shot, white shirt, floral print, off shoulder, outdoors, cherry blossom, tree shade, wariza, looking up, falling petals, half - closed eyes, white sky, clouds, live2d animation, upper body, high quality cinematic video of a woman sitting under a sakura tree. Dreamy and lonely, the camera close - ups on the face of the woman as she turns towards the viewer. The Camera is steady, This is a cowboy shot. The animation is smooth and fluid.
bad quality video, bright color tone, overexposed, static, blurred details, subtitles, style, works, paintings, pictures, still, overall grayish, worst quality, low quality, JPEG compression artifacts, ugly,残缺的, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, fused fingers, still pictures, cluttered background, three legs, many people in the background, walking backwards
📚 Documentation
Model Description
Flat Color - Style Trained on images without visible lineart, flat colors, and little to no indication of depth.
Reprinted from CivitAI: Link
Text to Video previews generated with [ComfyUI_examples/wan/#text - to - video](https://comfyanonymous.github.io/ComfyUI_examples/wan/#text - to - video)
Download Model
Weights for this model are available in Safetensors format.
[Download](motimalu/wan - flat - color - 1.3b - v2/tree/main) them in the Files & versions tab.
Training Config
dataset.toml
# Resolution settings.
resolutions = [512]
# Aspect ratio bucketing settings
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 7
# Frame buckets (1 is for images)
frame_buckets = [1]
[[directory]] # IMAGES
# Path to the directory containing images and their corresponding caption files.
path = '/mnt/d/huanvideo/training_data/images'
num_repeats = 5
resolutions = [720]
frame_buckets = [1] # Use 1 frame for images.
[[directory]] # VIDEOS
# Path to the directory containing videos and their corresponding caption files.
path = '/mnt/d/huanvideo/training_data/videos'
num_repeats = 5
resolutions = [512] # Set video resolution to 256 (e.g., 244p).
frame_buckets = [6, 28, 31, 32, 36, 42, 43, 48, 50, 53]
config.toml
# Dataset config file.
output_dir = '/mnt/d/wan/training_output'
dataset = 'dataset.toml'
# Training settings
epochs = 50
micro_batch_size_per_gpu = 1
pipeline_stages = 1
gradient_accumulation_steps = 4
gradient_clipping = 1.0
warmup_steps = 100
# eval settings
eval_every_n_epochs = 5
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1
# misc settings
save_every_n_epochs = 5
checkpoint_every_n_minutes = 30
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 1
steps_per_print = 1
video_clip_mode = 'single_middle'
[model]
type = 'wan'
ckpt_path = '../Wan2.1-T2V-1.3B'
dtype = 'bfloat16'
# You can use fp8 for the transformer when training LoRA.
transformer_dtype = 'float8'
timestep_sample_method = 'logit_normal'
[adapter]
type = 'lora'
rank = 32
dtype = 'bfloat16'
[optimizer]
type = 'adamw_optimi'
lr = 5e-5
betas = [0.9, 0.99]
weight_decay = 0.02
eps = 1e-8
📄 License
The model is licensed under the apache - 2.0 license.