Levit 256
LeViT-256 is an efficient vision model based on Transformer architecture, designed for fast inference and pretrained on the ImageNet-1k dataset.
Downloads 37
Release Time : 6/1/2022
Model Overview
LeViT is a vision model that combines the advantages of convolutional neural networks and Transformers, suitable for image classification tasks with efficient inference speed.
Model Features
Efficient Inference
Achieves faster inference speed than pure Transformer models by combining the strengths of CNN and Transformer.
Hybrid Architecture
Innovatively combines convolutional neural networks with Transformers, featuring both local and global feature extraction capabilities.
Teacher-Student Training
Uses a teacher model to guide the training process, improving model performance.
Model Capabilities
Image Classification
Visual Feature Extraction
Use Cases
Computer Vision
Object Recognition
Identify the category of objects in images
Can accurately classify 1,000 categories in ImageNet-1k.
Scene Understanding
Analyze the content of image scenes
Can recognize complex scenes such as palaces.
Featured Recommended AI Models