Vit Huge Patch14 Clip 224.metaclip 2pt5b
V
Vit Huge Patch14 Clip 224.metaclip 2pt5b
Developed by timm
A dual-purpose vision-language model trained on the MetaCLIP-2.5B dataset, supporting zero-shot image classification tasks
Downloads 3,173
Release Time : 10/23/2024
Model Overview
This model is a vision Transformer compatible with both OpenCLIP and timm frameworks, primarily designed for zero-shot image classification tasks with strong cross-modal understanding capabilities.
Model Features
Dual-framework compatibility
Supports both OpenCLIP and timm frameworks, offering flexible usage options
Large-scale pre-training
Trained on the large-scale MetaCLIP-2.5B dataset, with robust vision-language understanding capabilities
Zero-shot learning
Supports zero-shot image classification tasks without task-specific fine-tuning
Efficient architecture
Utilizes the Vision Transformer Huge architecture combined with quickgelu activation function, balancing performance and efficiency
Model Capabilities
Zero-shot image classification
Cross-modal understanding
Image feature extraction
Use Cases
Computer vision
Image classification
Classifies unseen image categories without additional training
Achieves high accuracy classification in zero-shot settings
Cross-modal retrieval
Enables cross-modal search between images and text
Content understanding
Automatic labeling
Generates descriptive labels for images
Featured Recommended AI Models
Š 2025AIbase