M

Minicpm V 2 6 Rk3588 1.1.4

Developed by c01zaut
MiniCPM-V 2.6 is a GPT-4V-level multimodal large language model supporting single-image, multi-image, and video understanding, optimized for RK3588 NPU
Downloads 31
Release Time : 11/7/2024

Model Overview

MiniCPM-V 2.6 is an 8B-parameter multimodal model built on SigLip-400M and Qwen2-7B, supporting single-image, multi-image, and video understanding with leading OCR capabilities and multilingual support

Model Features

Leading Performance
Achieved 65.2 average score on OpenCompass, surpassing proprietary models like GPT-4V
Multi-image Understanding
Supports multi-image dialogue and reasoning, achieving SOTA performance on benchmarks like Mantis-Eval
Video Understanding
Supports video input with spatiotemporal dense descriptions, outperforming GPT-4V
Powerful OCR Capability
Processes 1.8-megapixel images, surpassing GPT-4o and Gemini 1.5 Pro on OCRBench
Exceptional Efficiency
Processes 1.8-megapixel images with only 640 tokens generated, 75% fewer than most models
RK3588 Optimization
Specially optimized for RK3588 NPU, supporting various quantization methods like w8a8

Model Capabilities

Single-image Understanding
Multi-image Dialogue
Video Understanding
OCR Recognition
Multilingual Processing
Real-time Inference

Use Cases

Visual Question Answering
Image Content Description
Detailed description of input images
Generates accurate natural language descriptions
Multi-image Reasoning
Complex reasoning based on multiple images
Understands inter-image relationships and draws reasonable conclusions
Document Processing
Document OCR
Text extraction from scanned documents
High-precision text recognition and extraction
Video Analysis
Video Content Understanding
Analysis of spatiotemporal information in videos
Generates dense descriptions of videos
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase