Llava Phi 3 Mini 4k Instruct
A vision-language model that combines the Phi-3-mini-3.8B large language model with LLaVA v1.5, providing advanced vision-language understanding capabilities.
Downloads 550
Release Time : 4/26/2024
Model Overview
This project combines the Phi-3-mini-3.8B large language model with LLaVA v1.5, fully leveraging the advantages of both models to provide users with more advanced vision-language understanding capabilities.
Model Features
Combining the Advantages of Phi-3 and LLaVA
By combining the Phi-3-mini-3.8B large language model with the visual capabilities of LLaVA v1.5, it provides more advanced vision-language understanding capabilities.
Efficient Training Strategy
Adopting a two-stage strategy of pre-training and fine-tuning, only training the key parts to keep the model efficient.
Merged Weights
The repository contains the merged weights for easy direct use.
Model Capabilities
Vision-Language Understanding
Multimodal Task Processing
Image Caption Generation
Visual Question Answering
Use Cases
Vision-Language Tasks
Image Caption Generation
Generate detailed text descriptions based on the input images.
Visual Question Answering
Answer natural language questions about the image content.
Featured Recommended AI Models
Š 2025AIbase