LLaVA-Phi-3-mini-4k-instruct Open-source Model - Offering Advanced Visual Language Understanding Capabilities

Llava Phi 3 Mini 4k Instruct

Developed by MBZUAI

A vision-language model that combines the Phi-3-mini-3.8B large language model with LLaVA v1.5, providing advanced vision-language understanding capabilities.

Image-to-Text

Transformers

Open Source License:MIT #Multimodal Visual Understanding #Phi-3 Fine-tuning #LLaVA Enhancement

Downloads 550

Release Time : 4/26/2024

Model Overview

This project combines the Phi-3-mini-3.8B large language model with LLaVA v1.5, fully leveraging the advantages of both models to provide users with more advanced vision-language understanding capabilities.

Model Features

Combining the Advantages of Phi-3 and LLaVA

By combining the Phi-3-mini-3.8B large language model with the visual capabilities of LLaVA v1.5, it provides more advanced vision-language understanding capabilities.

Efficient Training Strategy

Adopting a two-stage strategy of pre-training and fine-tuning, only training the key parts to keep the model efficient.

Merged Weights

The repository contains the merged weights for easy direct use.

Model Capabilities

Vision-Language Understanding

Multimodal Task Processing

Image Caption Generation

Visual Question Answering

Use Cases

Vision-Language Tasks

Image Caption Generation

Generate detailed text descriptions based on the input images.

Visual Question Answering

Answer natural language questions about the image content.

Property	Details
Base Large Language Model (LLM)	Phi-3-mini-4k-instruct
Base Large Multimodal Model (LMM)	LLaVA-v1.5

Property	Details
Pretraining Dataset	LCS-558K
Fine-tuning Dataset	LLaVA-Instruct-665K

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Llava Phi 3 Mini 4k Instruct

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Phi-3-V: Extending the Visual Capabilities of LLaVA with Phi-3

🚀 Quick Start

✨ Features

🔧 Technical Details

Training Strategy

📦 Installation

📚 Documentation

Key Components

Training Data

📄 License

💡 Usage Tip