Vit Base Patch32 Clip 224.metaclip 2pt5b
V
Vit Base Patch32 Clip 224.metaclip 2pt5b
Developed by timm
A vision Transformer model trained on the MetaCLIP-2.5B dataset, compatible with both open_clip and timm frameworks
Downloads 5,571
Release Time : 10/23/2024
Model Overview
This is a dual-framework compatible vision Transformer model primarily designed for zero-shot image classification tasks, supporting usage under both open_clip and timm frameworks.
Model Features
Dual-framework compatibility
Supports both open_clip and timm frameworks, offering more flexible usage options
Large-scale pre-training
Trained on the large-scale MetaCLIP-2.5B dataset, possessing powerful visual representation capabilities
Fast inference
Utilizes 32x32 patch size and quickgelu activation function, balancing accuracy and speed
Model Capabilities
Zero-shot image classification
Image feature extraction
Cross-modal representation learning
Use Cases
Computer vision
Zero-shot image classification
Classify images of new categories without requiring specific category training data
Image retrieval
Retrieve relevant images based on text queries
Multimodal applications
Image-text matching
Determine whether images and text descriptions match
Featured Recommended AI Models
Š 2025AIbase