🚀 Bad-Anatomy-Realism-Classifier
A finetuned Vision Transformer model for classifying AI-generated pictures for bad anatomy and realism, currently a support model for a Youtube series.
🚀 Quick Start
This is a finetuned Vision Transformer model designed to classify AI-generated pictures for bad anatomy and realism. It serves as a support model for a Youtube series, and you're welcome to build upon it.
✨ Features
- Detecting Bad Anatomy in Realistic AI-Generated Images: Not all image generation models produce images with proper anatomy. Some may generate "bad hands" with more than 5 fingers. This model aims to detect such anatomy issues in AI-generated images.
- Determining True Realism Versus AI Realism: AI-generated images often face challenges in achieving realism, especially in terms of skin and generation style. Compared to normal social media posts, high-definition upscaled AI-generated images can be identified by features like shiny skin or very bright lighting.
📚 Documentation
Model Detail
This model was fine-tuned on the google/vit-base-patch16-224-in21k Vision Transformer (ViT).
Uses
- Detect whether an image is real or a well-generated AI image.
- Detect bad anatomy in AI-generated images to trigger regeneration.
Out-of-Scope Use
- Racism
- Illegal activities
Bias, Risks, and Limitations
The initial model was trained on images generated by Stable Diffusion v1.5 using the Beautiful Realistic Asians v6 checkpoint by pleasebankai. The dataset consists of only 134 images, with only 6 having unrealistic bad anatomy. (More dataset details will be added to the model card in future documentation updates.)
Recommendations
It is recommended to expand the dataset and continue training with a greater variety of characters to improve the model's performance on images that deviate from the training set characteristics.
Training and Testing Data
Dataset Image Label Criteria
Property |
Details |
Bad / Good Anatomy |
Any deformed body parts or extra limbs for the character; the background should not be overly matted as it can be adjusted in post - processing. |
Realistic vs. Unrealistic |
Determined by first - glance reaction, lighting, skin and hair appearance, and photography style. It's based on a "gut feeling" to replicate human judgment. |
Compatible Images For Dataset
The default data collator is used, and the images are mainly from SD 1.5. While the testing pipeline had no issues with 3 images, it's uncertain whether images and sizes from different models will break the training. Compatible models with default image sizes include:
- Stable Diffusion 1.5
- OpenDalle v1.1
- Flux 1
- Dall - E 3 on Copilot
Dataset Stats
Number Images Per Label
=======================
Realistic Bad Anatomy: 6 (4.48%)
Realistic Good Anatomy: 15 (11.19%)
Unrealistic Bad Anatomy: 81 (60.45%)
Unrealistic Good Anatomy: 32 (23.88%)
Total Number of Images: 134
Evaluation
Results
***** train metrics *****
epoch = 3.0
total_flos = 20135801GF
train_loss = 0.8453
train_runtime = 0:00:42.83
train_samples_per_second = 6.514
train_steps_per_second = 0.841
***** eval metrics *****
epoch = 3.0
eval_accuracy = 0.6341
eval_f1 = 0.513
eval_loss = 0.8219
eval_precision = 0.464
eval_recall = 0.6341
eval_runtime = 0:00:06.95
eval_samples_per_second = 5.893
eval_steps_per_second = 0.862
Summary
The initial dataset and finetuning resulted in a 64.41% accuracy and a 51.3% F1 score. This is low but expected for a small amateur dataset. Future improvements include adding more variety in characters, poses, clothing styles, lighting, camera styles, and model generations.
Model Examination
You can view example pipeline inferences and their results on the Initial Finetune notebook. The examples are at the bottom of the notebook. You can use ctr + f
and search for Test Model With Custom Inputs
to find them quickly.
Model Card Contact
If you have any questions, feel free to contact me: