Mobilevlm 3B
M
Mobilevlm 3B
Developed by mtgv
MobileVLM is a fast and powerful multi-modal vision-language model designed specifically for mobile devices, supporting efficient cross-modal interaction.
Downloads 346
Release Time : 12/31/2023
Model Overview
MobileVLM is a multi-modal vision-language model (MMVLM) designed to run on mobile devices. It integrates various architecture designs and technologies for mobile devices, including language models with 1.4 billion and 2.7 billion parameters trained from scratch, a multi-modal vision model pre-trained in the CLIP manner, and cross-modal interaction achieved through an efficient projector.
Model Features
Optimized for mobile devices
Designed specifically for mobile devices, integrating various architecture designs and technologies for mobile devices
Efficient inference
Achieves inference speeds of 21.5 and 65.3 tokens per second on Qualcomm Snapdragon 888 CPU and NVIDIA Jeston Orin GPU respectively
Multi-modal interaction
Enables efficient interaction between vision and language modalities through an efficient projector
Excellent performance
Performs comparably to some larger models in multiple typical VLM benchmark tests
Easy to deploy
Built on MobileLLaMA-2.7B-Chat, facilitating plug-and-play deployment
Model Capabilities
Vision-language understanding
Cross-modal interaction
Efficient inference on mobile devices
Image-text association
Use Cases
Mobile applications
Mobile vision question answering
Enables efficient image understanding and question answering functions on mobile devices
Inference speed of 21.5 - 65.3 tokens per second
Intelligent assistant
Provides intelligent assistant functions with multi-modal interaction for mobile devices
Embedded devices
Edge computing
Enables vision-language processing on resource-constrained edge devices
Featured Recommended AI Models
Š 2025AIbase