Levit 384
LeViT-384 is a vision Transformer model pre-trained on the ImageNet-1k dataset, combining the advantages of convolutional networks for faster inference speed.
Downloads 37
Release Time : 6/1/2022
Model Overview
The LeViT model is a vision model that combines convolutional networks and Transformer architecture, specifically designed for image classification tasks. It optimizes inference speed while maintaining high accuracy.
Model Features
Efficient Inference
Combines the advantages of convolutional networks to optimize the inference speed of traditional vision Transformers
High Accuracy
Trained on the ImageNet-1k dataset, it has excellent image classification capabilities
Teacher-Student Architecture
Uses a teacher-student training approach to enhance model performance
Model Capabilities
Image Classification
Visual Feature Extraction
Use Cases
Computer Vision
Object Recognition
Identifies objects in images and classifies them into 1000 ImageNet categories
Accurately recognizes common objects such as animals, everyday items, etc.
Scene Understanding
Analyzes the content of image scenes
Can identify scene types such as buildings, natural landscapes, etc.
Featured Recommended AI Models