C

Cvt 21 384 22k

Developed by microsoft
CvT-21 is a vision model combining convolutional and Transformer architectures, pretrained on ImageNet-22k and fine-tuned on ImageNet-1k
Downloads 134
Release Time : 4/4/2022

Model Overview

This model improves visual Transformers by introducing convolutional operations, enabling efficient image classification tasks at 384x384 resolution

Model Features

Convolution-Transformer Hybrid
Enhances traditional vision Transformers by introducing convolutional operations, improving local feature extraction
High-Resolution Processing
Supports 384x384 resolution image input, suitable for high-precision classification tasks
Large-Scale Pretraining
Pretrained on ImageNet-22k dataset, featuring powerful feature extraction capabilities

Model Capabilities

Image Classification
Visual Feature Extraction

Use Cases

Computer Vision
Object Recognition
Identify object categories in images (e.g., animals, daily objects)
Accurately classifies 1,000 categories in ImageNet-1k
Scene Classification
Classify complex scenes (e.g., natural landscapes, architecture)
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase