Vit Base Patch32 Clip 224.laion400m E32
V
Vit Base Patch32 Clip 224.laion400m E32
Developed by timm
Vision Transformer model trained on LAION-400M dataset, compatible with both OpenCLIP and timm frameworks
Downloads 5,957
Release Time : 10/23/2024
Model Overview
This is a vision-language model based on Vision Transformer architecture, primarily used for zero-shot image classification tasks. The model was trained on the LAION-400M dataset and supports both OpenCLIP and timm frameworks.
Model Features
Dual-framework compatibility
Supports both OpenCLIP and timm frameworks, offering more flexible application scenarios
Zero-shot learning
Can be directly applied to new image classification tasks without fine-tuning
Large-scale pre-training
Pre-trained on the massive LAION-400M dataset, possessing strong visual representation capabilities
Model Capabilities
Image classification
Zero-shot learning
Visual feature extraction
Use Cases
Image understanding
Zero-shot image classification
Classify images of new categories without specific training data
Image retrieval
Image search based on visual similarity
Multimodal applications
Image-text matching
Determine whether an image matches a text description
Featured Recommended AI Models
Š 2025AIbase