M

Minicpm V

Developed by openbmb
MiniCPM-V is an efficient lightweight multimodal model optimized for edge device deployment, supporting bilingual (Chinese-English) interaction and outperforming models of similar scale.
Downloads 19.74k
Release Time : 1/30/2024

Model Overview

An efficient multimodal model built upon SigLip-400M and MiniCPM-2.4B, connected via a perceptual resampler, featuring exceptional visual understanding and language generation capabilities.

Model Features

Ultra Efficiency
Requires only 64 visual tokens with low memory consumption, capable of running smoothly on mainstream GPUs, personal computers, and even mobile devices.
Outstanding Performance
Surpasses models of similar scale in benchmarks like MMMU and MME, rivaling the 9.6B-parameter Qwen-VL-Chat in certain scenarios.
Bilingual Support
The first edge-deployable multimodal model supporting bilingual (Chinese-English) interaction, based on ICLR 2024 spotlight paper technology.
Multimodal Understanding
Supports single/multi-image and video understanding, including advanced features like real-time video analysis on iPad.

Model Capabilities

Image Content Understanding
Visual Question Answering
Multi-Image Association Analysis
Video Content Understanding
Bilingual (Chinese-English) Interaction
Real-Time Video Processing

Use Cases

Education
Flora and Fauna Recognition Teaching
Real-time recognition and explanation of flora and fauna characteristics via camera.
Demonstrated accurate identification of mushroom species and snake features in examples.
Smart Devices
Mobile Visual Assistant
Real-time image understanding and Q&A on mobile devices.
Already supports deployment on Android/HarmonyOS systems.
Content Analysis
Multi-Image Association Understanding
Analyzes content and logical relationships across multiple images.
Performs excellently in MMbench tests.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase