C

Cvt 13 384

Developed by microsoft
CvT-13 is a vision transformer model pre-trained on the ImageNet-1k dataset, improving the performance of traditional vision transformers by introducing convolutional operations.
Downloads 27
Release Time : 4/4/2022

Model Overview

This model combines the advantages of convolutional neural networks and transformers, performing image classification tasks at 384x384 resolution and supporting recognition of 1000 ImageNet categories.

Model Features

Convolution-Transformer Hybrid Architecture
Combines the local feature extraction capability of CNNs with the global modeling ability of Transformers
High-Resolution Processing
Supports image input at 384x384 resolution
ImageNet Pretrained
Pre-trained on the ImageNet-1k dataset, supporting recognition of 1000 object categories

Model Capabilities

Image Classification
Object Recognition
Visual Feature Extraction

Use Cases

Computer Vision
General Object Recognition
Recognize common object categories in images
Can accurately classify 1000 ImageNet categories
Visual Content Analysis
Analyze image content and extract semantic information
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase