M

Mobilevlm 1.7B

Developed by mtgv
MobileVLM is a lightweight multi-modal vision-language model designed specifically for mobile devices, supporting efficient image understanding and text generation tasks.
Downloads 647
Release Time : 12/31/2023

Model Overview

MobileVLM is a multi-modal vision-language model optimized for mobile devices, combining efficient vision and language processing capabilities, suitable for real-time interaction scenarios on mobile devices.

Model Features

Optimized for mobile devices
Designed specifically for mobile devices, supporting efficient CPU and GPU inference.
Multi-modal interaction
Achieves cross-modal interaction between vision and language modalities through an efficient projector.
High-performance inference
Reaches inference speeds of 21.5 and 65.3 tokens per second on the Qualcomm Snapdragon 888 CPU and NVIDIA Jetson Orin GPU respectively.
Lightweight architecture
A lightweight language model with 1.4 billion and 2.7 billion parameters, suitable for mobile deployment.

Model Capabilities

Image understanding
Text generation
Multi-modal interaction
Real-time inference on mobile devices

Use Cases

Mobile applications
Real-time image description
Generate real-time image descriptions on mobile devices.
Efficient and low-latency image understanding capabilities.
Multi-modal chat assistant
An interactive chat assistant that combines images and text.
Supports intelligent responses to natural language and visual inputs.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase