C

Cvt 13 384 22k

Developed by microsoft
CvT-13 is a vision model combining convolution and Transformer, pre-trained on ImageNet-22k and fine-tuned on ImageNet-1k, suitable for image classification tasks.
Downloads 508
Release Time : 4/4/2022

Model Overview

This model improves visual Transformers by introducing convolutional operations, enabling efficient image classification at 384x384 resolution and supporting recognition of 1,000 ImageNet categories.

Model Features

Combination of Convolution and Transformer
Enhances traditional visual Transformers with convolutional operations to improve local feature extraction.
High-resolution processing
Supports 384x384 resolution input, suitable for fine-grained image classification.
Large-scale pre-training
Pre-trained on the ImageNet-22k dataset, featuring powerful representation capabilities.

Model Capabilities

Image classification
Visual feature extraction

Use Cases

Computer vision
Object recognition
Identify object categories in images (e.g., animals, daily objects)
Accurately classifies 1,000 ImageNet categories
Scene understanding
Analyze image scene content (e.g., natural landscapes, buildings)
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase