DiT - Wikiart - Large Open - Source Model - Freely Generate Artwork Images Based on Wikiart Style

Home

Dit Wikiart Large

Developed by kaupane

A diffusion transformer model trained on the Wikiart dataset for generating artwork images

Image Generation

Safetensors

Open Source License:MIT #Art Style Generation #Diffusion Transformer #Low-Resolution Painting

Downloads 35

Release Time : 4/9/2025

Model Overview

This model is a DiT (Diffusion Transformer) trained from scratch on the Wikiart dataset, designed to generate artwork images based on artistic genres and styles.

Model Features

Art Genre and Style Understanding

The model can understand and generate works in specific artistic genres and styles

Multiple Model Variants

Offers three variants of different scales: Small (S), Base (B), and Large (L)

Stable Training Process

The training process is stable, with loss values sharply decreasing from above 1.0000 to around 0.2000, eventually reaching approximately 0.1510

Model Capabilities

Artistic Image Generation

Style Transfer

Genre-Specific Image Generation

Use Cases

Artistic Creation

Art Style Exploration

Generate works in different artistic genres and styles for artists' reference

Produces visually appealing artwork

Digital Art Creation

Provides inspiration and materials for digital artists

Generates 256x256 resolution artwork

🚀 DiT-Wikiart Model

A diffusion transformer model trained on the Wikiart dataset for unconditional image generation.

🚀 Quick Start

To use the model, install the "huggingface_hub" library and download modeling_dit_wikiart.py in "Files and versions" for model definition. After that you can use the model using the following code:

from modeling_dit_wikiart import DiTWikiartModel

model = DiTWikiartModel.from_pretrained("kaupane/DiT-Wikiart-Large")
num_samples = 8
noisy_latents = torch.randn(num_samples,4,32,32)
predicted_noise = model(noisy_latents)
print(predicted_noise)

The model is paired with stabilityai/sd-vae-ft-ema.

✨ Features

This model is a DiT (diffusion transformer) trained on Wikiart dataset https://huggingface.co/datasets/Artificio/WikiArt from scratch. It is designed to generate art images given art genre and art style.
The model demonstrates basic abilities to understand genres and styles and produce visually - appealing paintings (at first glance).

📦 Installation

No specific installation steps other than installing the "huggingface_hub" library are provided in the original document.

💻 Usage Examples

Basic Usage

from modeling_dit_wikiart import DiTWikiartModel

model = DiTWikiartModel.from_pretrained("kaupane/DiT-Wikiart-Large")
num_samples = 8
noisy_latents = torch.randn(num_samples,4,32,32)
predicted_noise = model(noisy_latents)
print(predicted_noise)

📚 Documentation

Model Description

This model is a DiT (diffusion transformer) trained on Wikiart dataset https://huggingface.co/datasets/Artificio/WikiArt from scratch. It is designed to generate art images given art genre and art style.

Model Architecture

The model largely mirrors classic DiT architecture described in the paper Scalable Diffusion Models with Transformers with slight modifications:

Replaced ImageNet classes embeddings with Wikiart genres and styles embeddings;
Used post - norm instead of pre - norm;
Omitted final linear layer;
Replaced sin - cos - 2d positional embedding with learned positional embedding;
Models only predict noise and don't learn sigma;
Setting patch_size=2 for all model variants;
Models have different size settings. Please check modeling_dit_wikiart.py in this repository for more details if you are interested.

The model has three variants:

S: small, num_blocks = 8, hidden_size = 384, num_heads = 6, total_params = 20M;
B: base, num_blocks = 12, hidden_size = 640, num_heads = 10, total_params = 90M;
L: large, num_blocks = 16, hidden_size = 896, num_heads = 14, total_params = 234M.

Training Procedure

dataset: all model variants were trained on 103K Wikiart dataset with data augmentation by horizontal flipping.
optimizer: AdamW with default settings.
learning rate: linear warmup for first 1% steps where learning rate reached a maximum of 3e - 4, then cosine decay to zero in following steps.
epochs and batch size:
- S: 96 epochs with batch size of 176,
- B: 120 epochs with batch size of 192,
- L: 144 epochs with batch size of 192
device:
- S: single RTX 4060ti 16G for 24 hrs,
- B: single RTX 4060ti 16G for 90 hrs,
- L: single RTX 4090D 24G for 48 hrs, followed by single RTX 4060ti 16G for 100 hrs.
loss curve: all variants witnessed a dramatic loss in the first epoch from above 1.0000 to around 0.2000, followed by a much slower decrease to finally reach loss = 0.1600 at 20th epoch. DiT - S finally reached 0.1590; DiT - B finally reached 0.1525; DiT - L finally reached 0.1510. Training is stable without loss spike.

Performance and Limitations

The models demonstrates basic abilities to understand genres and styles and produce visually - appealing paintings (at first glance).
Limitations include:
- Failure to understand complex structures like human faces, buildings, etc.
- Occasional modal collapse when asked to generate genres or styles rarely seen in the dataset. style like minimalism and genre like uroshi - e for example.
- Resolution limited to 256x256
- Trained on Wikiart dataset, therefore unable to generate out - of - scope images

🔧 Technical Details

The model architecture is based on the classic DiT architecture with several modifications. The training procedure involves specific dataset usage, optimizer settings, learning rate schedules, epochs, batch sizes, and device configurations. The performance shows basic genre and style understanding but also has limitations in handling complex structures, rare styles, resolution, and out - of - scope image generation.

📄 License

This project is licensed under the MIT license.

Property	Details
Model Type	DiT (diffusion transformer)
Training Data	103K Wikiart dataset with horizontal flipping data augmentation
Pipeline Tag	unconditional - image - generation
License	MIT

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご