C

Cvt 13

Developed by microsoft
CvT-13 is a hybrid architecture model combining convolutional neural networks and vision transformers, pre-trained on the ImageNet-1k dataset, suitable for image classification tasks.
Downloads 21.80k
Release Time : 4/4/2022

Model Overview

This model improves vision transformers by introducing convolutional operations, enhancing local feature extraction while retaining the advantages of transformers, primarily used for image classification tasks.

Model Features

Convolution-Transformer Hybrid Architecture
Combines CNN's local feature extraction capability with the global modeling advantages of transformers
Efficient Image Processing
Pre-trained on ImageNet-1k, supports image classification at 224x224 resolution
Lightweight Design
Has fewer parameters and computational requirements compared to pure transformer models (specific parameter scale not disclosed)

Model Capabilities

Image Classification
Visual Feature Extraction

Use Cases

Computer Vision
General Object Recognition
Accurately classify and recognize everyday objects
Can recognize 1,000 categories in ImageNet-1k
Scene Understanding
Identify scene types in images (e.g., palaces, natural landscapes, etc.)
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase