M

Mobilevit Xx Small

Developed by apple
MobileViT is a lightweight, low-latency vision Transformer model that combines the strengths of CNNs and Transformers, making it suitable for mobile devices.
Downloads 6,077
Release Time : 5/30/2022

Model Overview

This model is pre-trained on the ImageNet-1k dataset and can be used for image classification tasks. It integrates MobileNetV2-style layers with Transformer modules for efficient image processing.

Model Features

Lightweight Design
With only 1.3M parameters, the model is suitable for mobile devices and resource-constrained environments.
Hybrid Architecture
Combines the local feature extraction capability of CNNs with the global modeling ability of Transformers.
No Positional Encoding Required
Unlike traditional ViT models, MobileViT does not require positional embeddings.
Multi-scale Training
Uses a multi-scale sampling strategy during training to enhance model adaptability.

Model Capabilities

Image Classification
Visual Feature Extraction

Use Cases

Computer Vision
General Image Classification
Classifies images into 1000 categories from ImageNet-1k.
Top-1 accuracy 69.0%, Top-5 accuracy 88.9%
Mobile Vision Applications
Suitable for real-time image recognition on mobile devices like smartphones.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase