Clip Vit Base Patch32 Stanford Cars
C
Clip Vit Base Patch32 Stanford Cars
Developed by tanganke
A visual classification model fine-tuned on the Stanford Cars dataset based on the CLIP Vision Transformer architecture
Downloads 4,143
Release Time : 4/28/2024
Model Overview
This model is a fine-tuned version of OpenAI's CLIP visual encoder on the Stanford Cars dataset, specifically designed for automotive image classification tasks.
Model Features
Domain-specific Fine-tuning
Fine-tuned on the Stanford Cars dataset, significantly improving automotive classification accuracy
Efficient Visual Encoding
Based on ViT architecture, processing images using 32x32 pixel patches
Modular Design
Can be used standalone as a visual encoder or integrated into the full CLIP model
Model Capabilities
Automotive Image Classification
Visual Feature Extraction
Fine-grained Image Recognition
Use Cases
Automotive Industry
Vehicle Model Identification
Identify the brand and model of cars in images
Accuracy reaches 78.19%
Used Car Evaluation
Automatically identify vehicle features through images
Retail
Automotive E-commerce Search
Search for similar vehicles through images
Featured Recommended AI Models