V

Vit Base Patch16 Clip 224.metaclip 2pt5b

Developed by timm
A dual-framework compatible vision model trained on the MetaCLIP-2.5B dataset, supporting both OpenCLIP and timm frameworks
Downloads 889
Release Time : 10/23/2024

Model Overview

This model is a vision model based on the Vision Transformer architecture, primarily used for zero-shot image classification tasks, compatible with both OpenCLIP and timm frameworks.

Model Features

Dual-framework compatibility
Supports both OpenCLIP and timm frameworks, providing more flexible usage options
QuickGELU activation
Uses quickgelu activation function, potentially offering faster training and inference speeds
Large-scale pre-training
Trained on the large-scale MetaCLIP-2.5B dataset, with strong generalization capabilities

Model Capabilities

Zero-shot image classification
Image feature extraction
Cross-modal understanding

Use Cases

Computer vision
Image classification
Classify images of new categories without fine-tuning
Visual search
Search for relevant images based on text descriptions
Multimodal applications
Image-text matching
Evaluate the matching degree between images and text descriptions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase