M

Minicpm Llama3 V 2 5

Developed by openbmb
MiniCPM-V 2.6 is a multimodal large model launched by OpenBMB, surpassing GPT-4V in single-image, multi-image, and video understanding tasks, and supports real-time video understanding on iPad.
Downloads 31.48k
Release Time : 5/19/2024

Model Overview

MiniCPM-V 2.6 is a multimodal large model built on SigLip-400M and Llama3-8B-Instruct, with a total of 8B parameters, demonstrating significant advantages in OCR, multilingual support, and deployment on edge devices.

Model Features

Top Performance
Achieved an average score of 65.1 in OpenCompass evaluation, surpassing commercial models like GPT-4V-1106 and Gemini Pro.
Powerful OCR Capability
Supports images with any aspect ratio (up to 1344x1344/1.8 million pixels) and scored 700+ in OCRBench evaluation.
Multilingual Support
Supports 30+ languages including Chinese, English, German, French, Spanish, Italian, Korean, and Japanese.
Efficient Deployment on Edge Devices
Achieves efficient operation on edge devices through quantization, CPU/NPU optimization, and compilation optimization.

Model Capabilities

Image Understanding
Text Generation
Multilingual Processing
OCR Recognition
Video Understanding
Complex Reasoning
Instruction Following

Use Cases

Document Processing
Full-text Extraction
Extract complete text content from images.
High-precision extraction results.
Table to Markdown
Convert tables in images to Markdown format.
Structured output.
Multilingual Applications
Multilingual Image Understanding
Supports image content understanding in 30+ languages.
Cross-language generalization capability.
Edge Device Applications
Real-time Video Understanding on Mobile Devices
Enables real-time video content analysis on devices like iPad.
Efficient operation.
Featured Recommended AI Models
ยฉ 2025AIbase