LLaVA Open-Source Multimodal Chatbot - Freely Support Visual Dialogue, Easily Meet Diverse Needs

Liuhaotian Llava V1.5 13b GGUF

Developed by PsiPi

LLaVA is an open-source multimodal chatbot, based on the LLaMA/Vicuna architecture, fine-tuned with multimodal instruction-following data.

Text-to-Image #Multimodal Dialogue #Visual Question Answering #Instruction Following

Downloads 1,225

Release Time : 12/1/2023

Model Overview

LLaVA is a research-oriented large multimodal model primarily used in the fields of computer vision, natural language processing, and artificial intelligence research.

Model Features

Multimodal Capability

Capable of processing both image and text inputs for cross-modal understanding

Instruction Following

Specially trained to follow multimodal instructions

Open-source Model

Released under an open-source license for research and development use

End-to-End Inference

Supports dependency-free inference via llama.cpp

Model Capabilities

Image-text dialogue

Visual question answering

Image caption generation

Multimodal instruction following

Cross-modal understanding

Use Cases

Academic Research

Multimodal Model Research

Used to study the performance and capability boundaries of large multimodal models

Human-Computer Interaction Research

Explores multimodal human-computer interaction based on vision and language

Educational Applications

Visual-Assisted Learning

Helps students understand complex visual content

🚀 LLaVA Model Card

LLaVA is an open - source chatbot designed for research on large multimodal models and chatbots. It offers valuable insights for researchers and hobbyists in related fields.

🚀 Quick Start

This README provides detailed information about the LLaVA model, including its technical details, intended use, training and evaluation datasets, and license information.

✨ Features

Multimodal Capability: LLaVA is trained on multimodal instruction - following data, enabling it to handle both images and text.
Based on Transformer: It is an auto - regressive language model built on the transformer architecture.

📦 Installation

This repo contains GGUF files that allow you to inference llava - v1.5 - 13b with llama.cpp end - to - end without any extra dependency.

💻 Usage Examples

There are no specific code examples provided in the original README.

📚 Documentation

Model details

Property	Details
Model Type	LLaVA is an open - source chatbot trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture.
Model Date	LLaVA - v1.5 - 13B was trained in September 2023.
Paper or resources for more information	[https://llava - vl.github.io/](https://llava - vl.github.io/)

Intended use

Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots. Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

558K filtered image - text pairs from LAION/CC/SBU, captioned by BLIP.
158K GPT - generated multimodal instruction - following data.
450K academic - task - oriented VQA data mixture.
40K ShareGPT data.

Evaluation dataset

A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction - following LMMs.

Notes

⚠️ Important Note The mmproj - model - f16.gguf file structure is experimental and may change. Always use the latest code in llama.cpp.

🔧 Technical Details

LLaVA is an open - source chatbot trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture.

📄 License

Where to send questions or comments about the model: [https://github.com/haotian - liu/LLaVA/issues](https://github.com/haotian - liu/LLaVA/issues)

![image/png](https://cdn - uploads.huggingface.co/production/uploads/64a22257d3149e05bc6d259f/QuoYvv46QmwgAS6d3LYxj.png)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご