Cvt 21 384 22k
CvT-21 is a vision model combining convolutional and Transformer architectures, pretrained on ImageNet-22k and fine-tuned on ImageNet-1k
Downloads 134
Release Time : 4/4/2022
Model Overview
This model improves visual Transformers by introducing convolutional operations, enabling efficient image classification tasks at 384x384 resolution
Model Features
Convolution-Transformer Hybrid
Enhances traditional vision Transformers by introducing convolutional operations, improving local feature extraction
High-Resolution Processing
Supports 384x384 resolution image input, suitable for high-precision classification tasks
Large-Scale Pretraining
Pretrained on ImageNet-22k dataset, featuring powerful feature extraction capabilities
Model Capabilities
Image Classification
Visual Feature Extraction
Use Cases
Computer Vision
Object Recognition
Identify object categories in images (e.g., animals, daily objects)
Accurately classifies 1,000 categories in ImageNet-1k
Scene Classification
Classify complex scenes (e.g., natural landscapes, architecture)
Featured Recommended AI Models