L

Llava Phi 3 Mini Gguf

Developed by xtuner
LLaVA-Phi-3-mini is a fine-tuned LLaVA model based on Phi-3-mini-4k-instruct and CLIP-ViT-Large-patch14-336, specializing in image-to-text tasks.
Downloads 1,676
Release Time : 4/25/2024

Model Overview

This model combines the language capabilities of Phi-3-mini-4k-instruct with the visual encoding power of CLIP-ViT-Large-patch14-336 for image understanding and text generation tasks.

Model Features

Efficient Fine-tuning
Utilizes the XTuner toolkit for efficient fine-tuning, combining the strengths of Phi-3-mini and CLIP-ViT.
Multimodal Capability
Capable of processing both image and text inputs to generate relevant textual descriptions.
High Performance
Demonstrates excellent performance across multiple benchmarks such as MMBench, MMMU, and SEED-IMG.

Model Capabilities

Image Understanding
Text Generation
Multimodal Reasoning

Use Cases

Image Captioning
Automatic Image Annotation
Generates detailed textual descriptions for images, suitable for content management and retrieval.
Achieved 70.0 accuracy on the SEED-IMG test.
Visual Question Answering
Image Content Q&A
Answers complex questions about image content.
Achieved 69.2 accuracy on the MMBench test.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase