V

Vit Base Patch16 Clip 224.metaclip 400m

Developed by timm
A dual-framework compatible vision model trained on the MetaCLIP-400M dataset, supporting both OpenCLIP and timm frameworks
Downloads 1,206
Release Time : 10/23/2024

Model Overview

This is a vision model based on the Vision Transformer architecture, specifically designed for zero-shot image classification tasks. The model is trained on the MetaCLIP-400M dataset and is compatible with both OpenCLIP and timm frameworks.

Model Features

Dual Framework Compatibility
Supports both OpenCLIP and timm frameworks, offering more flexible usage options
Zero-shot Learning Capability
Capable of classifying new categories without specific training
QuickGELU Activation
Uses the quickgelu variant, potentially providing faster training and inference speeds

Model Capabilities

Zero-shot Image Classification
Image Feature Extraction
Cross-modal Representation Learning

Use Cases

Computer Vision
Open-domain Image Classification
Classify images of arbitrary categories without specific training
Image Retrieval
Semantic similarity-based image search
Multimodal Applications
Image-Text Matching
Evaluate the matching degree between images and text descriptions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase