I

Imp V1.5 4B Phi3

Developed by MILVLG
Imp-v1.5-4B-Phi3 is a high-performance lightweight multimodal large model with only 4 billion parameters, built on the Phi-3 framework and SigLIP visual encoder.
Downloads 140
Release Time : 5/20/2024

Model Overview

This model is dedicated to creating a high-performance lightweight multimodal large model, trained on a million-scale mixed dataset, suitable for various vision-language tasks.

Model Features

Lightweight Design
With only 4 billion parameters, it is more lightweight compared to similar models, making it suitable for resource-limited environments.
High-Performance Multimodal
Combines text and visual information processing capabilities, excelling in multiple benchmark tests.
Efficient Visual Encoding
Utilizes the SigLIP visual encoder to effectively process image inputs.

Model Capabilities

Text Generation
Image Understanding
Visual Question Answering
Multimodal Reasoning

Use Cases

Education
Visual Question Answering
Answer various questions about image content
Achieved 81.5 points on the VQAv2 dataset
Research
Multimodal Benchmark Testing
Used to evaluate the comprehensive capabilities of multimodal models
Achieved 1507.7 points on the MME(P) benchmark
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase