Vit Base Patch16 Clip 224.metaclip 2pt5b
V
Vit Base Patch16 Clip 224.metaclip 2pt5b
Developed by timm
A dual-framework compatible vision model trained on the MetaCLIP-2.5B dataset, supporting both OpenCLIP and timm frameworks
Downloads 889
Release Time : 10/23/2024
Model Overview
This model is a vision model based on the Vision Transformer architecture, primarily used for zero-shot image classification tasks, compatible with both OpenCLIP and timm frameworks.
Model Features
Dual-framework compatibility
Supports both OpenCLIP and timm frameworks, providing more flexible usage options
QuickGELU activation
Uses quickgelu activation function, potentially offering faster training and inference speeds
Large-scale pre-training
Trained on the large-scale MetaCLIP-2.5B dataset, with strong generalization capabilities
Model Capabilities
Zero-shot image classification
Image feature extraction
Cross-modal understanding
Use Cases
Computer vision
Image classification
Classify images of new categories without fine-tuning
Visual search
Search for relevant images based on text descriptions
Multimodal applications
Image-text matching
Evaluate the matching degree between images and text descriptions
Featured Recommended AI Models
Š 2025AIbase