L

Llava Phi 3 Mini 4k Instruct

Developed by MBZUAI
A vision-language model that combines the Phi-3-mini-3.8B large language model with LLaVA v1.5, providing advanced vision-language understanding capabilities.
Downloads 550
Release Time : 4/26/2024

Model Overview

This project combines the Phi-3-mini-3.8B large language model with LLaVA v1.5, fully leveraging the advantages of both models to provide users with more advanced vision-language understanding capabilities.

Model Features

Combining the Advantages of Phi-3 and LLaVA
By combining the Phi-3-mini-3.8B large language model with the visual capabilities of LLaVA v1.5, it provides more advanced vision-language understanding capabilities.
Efficient Training Strategy
Adopting a two-stage strategy of pre-training and fine-tuning, only training the key parts to keep the model efficient.
Merged Weights
The repository contains the merged weights for easy direct use.

Model Capabilities

Vision-Language Understanding
Multimodal Task Processing
Image Caption Generation
Visual Question Answering

Use Cases

Vision-Language Tasks
Image Caption Generation
Generate detailed text descriptions based on the input images.
Visual Question Answering
Answer natural language questions about the image content.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase