Cvt 13 384
CvT-13 is a vision transformer model pre-trained on the ImageNet-1k dataset, improving the performance of traditional vision transformers by introducing convolutional operations.
Downloads 27
Release Time : 4/4/2022
Model Overview
This model combines the advantages of convolutional neural networks and transformers, performing image classification tasks at 384x384 resolution and supporting recognition of 1000 ImageNet categories.
Model Features
Convolution-Transformer Hybrid Architecture
Combines the local feature extraction capability of CNNs with the global modeling ability of Transformers
High-Resolution Processing
Supports image input at 384x384 resolution
ImageNet Pretrained
Pre-trained on the ImageNet-1k dataset, supporting recognition of 1000 object categories
Model Capabilities
Image Classification
Object Recognition
Visual Feature Extraction
Use Cases
Computer Vision
General Object Recognition
Recognize common object categories in images
Can accurately classify 1000 ImageNet categories
Visual Content Analysis
Analyze image content and extract semantic information
Featured Recommended AI Models