đ AI Detection Model
An AI detection model for image classification, capable of distinguishing between real and AI - generated images.
đ Quick Start
This README provides detailed information about the AI Image Detect Distilled model, including its architecture, training process, data sources, performance, and future directions.
⨠Features
- Multi - model Distillation: Combines the features learned from three separate models into a small ViT model for efficient detection.
- Diverse Data Sources: Utilizes multiple datasets to ensure the similarity between real and AI - generated images.
- Good Performance: Achieves high accuracy on both validation and real - world datasets, outperforming other popular models.
đĻ Installation
No installation steps are provided in the original document.
đģ Usage Examples
No code examples are provided in the original document.
đ Documentation
Model Architecture and Training
Three separate models were initially trained:
- Midjourney vs. Real Images
- Stable Diffusion vs. Real Images
- Stable Diffusion Fine - tunings vs. Real Images
The data preparation process was as follows:
- Used Google's Open Image Dataset for real images
- Described real images using BLIP (Bootstrapping Language - Image Pre - training)
- Generated Stable Diffusion images using BLIP descriptions
- Found similar Midjourney images based on BLIP descriptions
This approach ensured that real and AI - generated images were as similar as possible, differing only in their origin.
The three models were then distilled into a small ViT model with 11.8 Million Parameters, combining their learned features for more efficient detection.
Data Sources
- Google's Open Image Dataset: link
- Ivan Sivkov's Midjourney Dataset: link
- TANREI(NAMA)'s Stable Diffusion Prompts Dataset: link
Performance
- Validation Set: 74% accuracy. It was held out from the training data to assess generalization.
- Custom Real - World Set: 72% accuracy. Composed of self - captured images and online - sourced images, it is designed to be more representative of internet - based images.
- Comparative Analysis: Outperformed other popular AI detection models by 5 percentage points on both sets. Other models achieved 89% and 79% on the validation and real - world sets respectively.
Key Insights
- Strong generalization on validation data (75% accuracy).
- Good adaptability to diverse, real - world images (72% accuracy).
- Consistent outperformance of other popular models.
- A 10 - point accuracy drop from the validation to the real - world set indicates room for improvement.
- Comprehensive training on multiple AI generation techniques contributes to model versatility.
- Focus on subtle differences in image generation rather than content disparities.
Future Directions
- Expand the dataset with more diverse, real - world examples to bridge the performance gap.
- Improve generalization to internet - sourced images.
- Conduct error analysis on misclassified samples to identify patterns.
- Integrate new AI image generation techniques as they emerge.
- Consider fine - tuning for specific domains where detection accuracy is critical.
đ§ Technical Details
The model architecture involves distilling three separate models into a small ViT model with 11.8 Million Parameters. The data preparation process carefully aligns real and AI - generated images to ensure similarity in appearance.
đ License
This project is licensed under the MIT license.