Levit 128S
LeViT-128S is a vision Transformer model pretrained on the ImageNet-1k dataset, combining the advantages of convolutional networks for faster inference.
Downloads 3,198
Release Time : 6/1/2022
Model Overview
LeViT is a vision model that integrates convolutional networks and Transformer architectures, designed for image classification tasks, optimizing inference speed while maintaining high accuracy.
Model Features
Hybrid Architecture Design
Combines the strengths of convolutional networks and Transformers to optimize computational efficiency while maintaining performance on vision tasks.
Efficient Inference
Designed for fast inference, with lower computational overhead compared to pure Transformer architectures.
ImageNet Pretraining
Pretrained on the ImageNet-1k dataset, ready for direct use in thousand-class image classification tasks.
Model Capabilities
Image Classification
Visual Feature Extraction
Use Cases
Computer Vision
General Object Recognition
Identify common objects in images (e.g., animals, everyday items)
Can accurately classify 1,000 categories from ImageNet
Scene Understanding
Analyze image scene content (e.g., indoor/outdoor environments, building types)
Featured Recommended AI Models