Vit Huge Patch14 Clip 224.metaclip Altogether
CLIP model based on ViT-Huge architecture, supporting zero-shot image classification tasks
Downloads 171
Release Time : 12/23/2024
Model Overview
This model is a dual-purpose vision-language model from OpenCLIP and timm, based on the ViT-Huge architecture, trained with the MetaCLIP dataset, and supports zero-shot image classification tasks.
Model Features
Dual-framework compatibility
Supports both OpenCLIP and timm frameworks
Zero-shot capability
Performs image classification tasks without specific training
Large-scale pre-training
Trained with the MetaCLIP dataset, possessing broad visual concept understanding
Model Capabilities
Zero-shot image classification
Image-text matching
Cross-modal understanding
Use Cases
Content understanding
Automatic image tagging
Generates descriptive labels for unlabeled images
Can recognize thousands of common objects and scenes
Visual search
Text-based image retrieval
Finds relevant images using natural language queries
Achieves cross-modal retrieval without training
Featured Recommended AI Models
Š 2025AIbase