Levit 128
LeViT-128 is an image classification model based on the Vision Transformer architecture, achieving efficient inference by combining the advantages of convolutional networks.
Downloads 44
Release Time : 6/1/2022
Model Overview
The LeViT-128 model is pretrained on the ImageNet-1k dataset at 224x224 resolution and can classify images into 1,000 categories.
Model Features
Efficient Inference
Achieves faster inference speed than traditional Vision Transformers by leveraging the advantages of convolutional networks.
Hybrid Architecture
Innovatively combines Transformer and convolutional networks, incorporating the strengths of both.
Model Capabilities
Image Classification
Visual Feature Extraction
Use Cases
Computer Vision
Object Recognition
Identify object categories in images
Can accurately classify 1,000 categories from ImageNet
Visual Content Analysis
Analyze image content and extract features
Featured Recommended AI Models