llava-phi-2-3b Open-Source Multimodal Chatbot - Supports Text and Image Input to Generate Natural Language Responses

Llava Phi 2 3b

Developed by marianna13

LLaVa-Phi-2-3B is an open-source multimodal chatbot model, fine-tuned based on the Phi-2 architecture, capable of processing image and text inputs to generate natural language responses.

Text-to-Image

Transformers

EnglishOpen Source License:MIT #Multimodal Dialogue #Lightweight Vision-Language Model #Instruction Following Optimization

Downloads 153

Release Time : 1/28/2024

Model Overview

This model is trained by fine-tuning the Phi-2 model on multimodal instruction-following data, possessing vision-language understanding capabilities suitable for tasks like image captioning and visual question answering.

Model Features

Multimodal Understanding

Capable of processing both image and text inputs, understanding visual content, and generating relevant responses.

Efficient Parameter Utilization

Achieves performance close to larger models with only 3B parameters.

Instruction Following

Specially trained to follow user instructions, making it suitable for conversational interactions.

Model Capabilities

Image understanding

Visual question answering

Image caption generation

Multimodal dialogue

Instruction following

Use Cases

Education

Visual-assisted Learning

Helps students understand complex diagrams or image content.

Accessibility Technology

Image Description Service

Provides audio descriptions of image content for visually impaired users.

Content Moderation

Multimodal Content Analysis

Simultaneously analyzes image and text content for more comprehensive content moderation.

Model	Parameters	SQA	GQA	TextVQA	POPE
[LLaVA - 1.5](https://huggingface.co/liuhaotian/llava - v1.5 - 7b)	7.3B	68.0	62.0	58.3	85.3
[MC - LLaVA - 3B](https://huggingface.co/visheratin/MC - LLaVA - 3b)	3B	-	49.6	38.59	-
LLaVA - Phi	3B	68.4	-	48.6	85.0
moondream1	1.6B	-	56.3	39.8	-
llava - phi - 2 - 3b	3B	69.0	51.2	47.0	86.0

Model	BLEU_1	BLEU_2	BLEU_3	BLEU_4	METEOR	ROUGE_L	CIDEr	SPICE
llava - 1.5 - 7b	75.8	59.8	45	33.3	29.4	57.7	108.8	23.5
llava - phi - 2 - 3b	67.7	50.5	35.7	24.2	27.0	52.4	85.0	20.7

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Llava Phi 2 3b

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 LLaVa-Phi-2-3B Model Card

🚀 Quick Start

✨ Features

📚 Documentation

📋 Model Details

Model Description

Model Sources

📊 Evaluation

Benchmarks

Image Captioning (MS COCO)

📄 License