bad-anatomy-realism-classifier Open-source Image Classification Model - Detecting Differences in Anatomical Abnormalities and Realism of AI Images

Bad Anatomy Realism Classifier

Developed by angusleung100

A Vision Transformer fine-tuned image classification model for detecting anatomical anomalies and realism discrepancies in AI-generated images

Image Classification

Transformers

#AI Image Authenticity Detection #Anatomical Anomaly Recognition #ViT Fine-tuning Model

Downloads 17.19k

Release Time : 8/4/2024

Model Overview

This model is specifically designed to identify anatomical issues (e.g., deformed hands) in AI-generated images and distinguish between real photos and highly realistic AI-generated images.

Model Features

Anatomical Anomaly Detection

Capable of identifying common anatomical issues in AI-generated images, such as extra fingers or limb deformities

Realism Assessment

Can differentiate between real photographs and highly realistic AI-generated images based on features like lighting and skin texture

Lightweight Fine-tuning

Efficiently fine-tuned from a pre-trained ViT model, suitable for small-scale datasets

Model Capabilities

AI-generated Image Detection

Anatomical Anomaly Recognition

Image Realism Assessment

Image Classification

Use Cases

Content Moderation

AI-generated Content Identification

Automatically detect AI-generated images on social media platforms

64.41% accuracy

Image Generation Quality Control

AI Image Generation Feedback

Provide quality feedback for image generation systems to trigger regeneration

🚀 Bad-Anatomy-Realism-Classifier

A finetuned Vision Transformer model for classifying AI-generated pictures for bad anatomy and realism, currently a support model for a Youtube series.

🚀 Quick Start

This is a finetuned Vision Transformer model designed to classify AI-generated pictures for bad anatomy and realism. It serves as a support model for a Youtube series, and you're welcome to build upon it.

✨ Features

Detecting Bad Anatomy in Realistic AI-Generated Images: Not all image generation models produce images with proper anatomy. Some may generate "bad hands" with more than 5 fingers. This model aims to detect such anatomy issues in AI-generated images.
Determining True Realism Versus AI Realism: AI-generated images often face challenges in achieving realism, especially in terms of skin and generation style. Compared to normal social media posts, high-definition upscaled AI-generated images can be identified by features like shiny skin or very bright lighting.

📚 Documentation

Model Detail

This model was fine-tuned on the google/vit-base-patch16-224-in21k Vision Transformer (ViT).

Uses

Detect whether an image is real or a well-generated AI image.
Detect bad anatomy in AI-generated images to trigger regeneration.

Out-of-Scope Use

Racism
Illegal activities

Bias, Risks, and Limitations

The initial model was trained on images generated by Stable Diffusion v1.5 using the Beautiful Realistic Asians v6 checkpoint by pleasebankai. The dataset consists of only 134 images, with only 6 having unrealistic bad anatomy. (More dataset details will be added to the model card in future documentation updates.)

Recommendations

It is recommended to expand the dataset and continue training with a greater variety of characters to improve the model's performance on images that deviate from the training set characteristics.

Training and Testing Data

Dataset Image Label Criteria

Property	Details
Bad / Good Anatomy	Any deformed body parts or extra limbs for the character; the background should not be overly matted as it can be adjusted in post - processing.
Realistic vs. Unrealistic	Determined by first - glance reaction, lighting, skin and hair appearance, and photography style. It's based on a "gut feeling" to replicate human judgment.

Compatible Images For Dataset

The default data collator is used, and the images are mainly from SD 1.5. While the testing pipeline had no issues with 3 images, it's uncertain whether images and sizes from different models will break the training. Compatible models with default image sizes include:

Stable Diffusion 1.5
OpenDalle v1.1
Flux 1
Dall - E 3 on Copilot

Dataset Stats

Number Images Per Label
=======================
Realistic Bad Anatomy: 6 (4.48%)
Realistic Good Anatomy: 15 (11.19%)
Unrealistic Bad Anatomy: 81 (60.45%)
Unrealistic Good Anatomy: 32 (23.88%)

Total Number of Images:  134

Evaluation

Results

***** train metrics *****
  epoch                    =        3.0
  total_flos               = 20135801GF
  train_loss               =     0.8453
  train_runtime            = 0:00:42.83
  train_samples_per_second =      6.514
  train_steps_per_second   =      0.841

***** eval metrics *****
  epoch                   =        3.0
  eval_accuracy           =     0.6341
  eval_f1                 =      0.513
  eval_loss               =     0.8219
  eval_precision          =      0.464
  eval_recall             =     0.6341
  eval_runtime            = 0:00:06.95
  eval_samples_per_second =      5.893
  eval_steps_per_second   =      0.862

Summary

The initial dataset and finetuning resulted in a 64.41% accuracy and a 51.3% F1 score. This is low but expected for a small amateur dataset. Future improvements include adding more variety in characters, poses, clothing styles, lighting, camera styles, and model generations.

Model Examination

You can view example pipeline inferences and their results on the Initial Finetune notebook. The examples are at the bottom of the notebook. You can use ctr + f and search for Test Model With Custom Inputs to find them quickly.

Model Card Contact

If you have any questions, feel free to contact me:

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご