V

Vit Base Patch32 Clip 224.metaclip 400m

Developed by timm
A vision-language model trained on the MetaCLIP-400M dataset, supporting zero-shot image classification tasks
Downloads 2,406
Release Time : 10/23/2024

Model Overview

This is a dual-purpose vision-language model that can be used in both OpenCLIP and timm frameworks, primarily for zero-shot image classification tasks.

Model Features

Dual Framework Support
Compatible with both OpenCLIP and timm frameworks, offering flexible usage options
Zero-shot Learning Capability
Can perform image classification tasks without task-specific training
Fast Inference
Optimized based on ViT-B-32 architecture, providing efficient inference speed

Model Capabilities

Zero-shot Image Classification
Image Feature Extraction
Cross-modal Understanding

Use Cases

Computer Vision
General Image Classification
Classify images of unknown categories without specific training
Performs well in various image classification tasks
Content Moderation
Identify inappropriate content in images
Multimodal Applications
Image-Text Matching
Evaluate the matching degree between images and text descriptions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase