MobileVLM-3B Open-Source Multimodal Vision-Language Model - Efficient Cross-Modal Interaction for Mobile Devices

Mobilevlm 3B

Developed by mtgv

MobileVLM is a fast and powerful multi-modal vision-language model designed specifically for mobile devices, supporting efficient cross-modal interaction.

Text-to-Image

Transformers

Open Source License:Apache-2.0 #Mobile multi-modal #Efficient vision-language #Low-latency inference

Downloads 346

Release Time : 12/31/2023

Model Overview

MobileVLM is a multi-modal vision-language model (MMVLM) designed to run on mobile devices. It integrates various architecture designs and technologies for mobile devices, including language models with 1.4 billion and 2.7 billion parameters trained from scratch, a multi-modal vision model pre-trained in the CLIP manner, and cross-modal interaction achieved through an efficient projector.

Model Features

Optimized for mobile devices

Designed specifically for mobile devices, integrating various architecture designs and technologies for mobile devices

Efficient inference

Achieves inference speeds of 21.5 and 65.3 tokens per second on Qualcomm Snapdragon 888 CPU and NVIDIA Jeston Orin GPU respectively

Multi-modal interaction

Enables efficient interaction between vision and language modalities through an efficient projector

Excellent performance

Performs comparably to some larger models in multiple typical VLM benchmark tests

Easy to deploy

Built on MobileLLaMA-2.7B-Chat, facilitating plug-and-play deployment

Model Capabilities

Vision-language understanding

Cross-modal interaction

Efficient inference on mobile devices

Image-text association

Use Cases

Mobile applications

Mobile vision question answering

Enables efficient image understanding and question answering functions on mobile devices

Inference speed of 21.5 - 65.3 tokens per second

Intelligent assistant

Provides intelligent assistant functions with multi-modal interaction for mobile devices

Embedded devices

Edge computing

Enables vision-language processing on resource-constrained edge devices

Property	Details
Repository	https://github.com/Meituan-AutoML/MobileVLM
Paper	https://arxiv.org/abs/2312.16886

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Mobilevlm 3B

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 MobileVLM

🚀 Quick Start

✨ Features

📚 Documentation

Model Sources

Training Details

📄 License