LLaVA - v1.6 - Vicuna - 13B Open - source Multimodal Chatbot - Free Experience of Cross

Llava V1.6 Vicuna 13b

Developed by liuhaotian

LLaVA is an open-source multimodal chatbot, fine-tuned on large language models with multimodal instruction-following data.

Image-to-Text

Transformers

#Multimodal Dialogue #Visual Question Answering #Instruction Following

Downloads 7,080

Release Time : 1/31/2024

Model Overview

LLaVA is an autoregressive language model based on the Transformer architecture, primarily used for researching large multimodal models and chatbots.

Model Features

Multimodal Capability

Combines image and text inputs to generate text outputs.

Instruction Following

Capable of understanding and executing complex multimodal instructions.

Open-source

The model is open-source and available for research and development.

Model Capabilities

Image-Text Understanding

Multimodal Dialogue

Visual Question Answering

Instruction Following

Use Cases

Research

Multimodal Model Research

Used to study the behavior and performance of large multimodal models.

Education

Visual Question Answering System

Build systems capable of answering questions about image content.

🚀 LLaVA Model Card

LLaVA is an open - source chatbot that combines image and text processing capabilities. It offers valuable insights for research in multimodal models and chatbot development.

🚀 Quick Start

This section is not provided in the original document, so it is skipped.

✨ Features

LLaVA is an auto - regressive language model based on the transformer architecture. It is fine - tuned on multimodal instruction - following data, enabling it to handle image - text - to - text tasks effectively.

📦 Installation

This section is not provided in the original document, so it is skipped.

💻 Usage Examples

This section is not provided in the original document, so it is skipped.

📚 Documentation

Model details

Property	Details
Model Type	LLaVA is an open - source chatbot trained by fine - tuning LLM on multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture. Base LLM: [lmsys/vicuna - 13b - v1.5](https://huggingface.co/lmsys/vicuna - 13b - v1.5)
Model Date	LLaVA - v1.6 - Vicuna - 13B was trained in December 2023.
Paper or resources for more information	[https://llava - vl.github.io/](https://llava - vl.github.io/)

Intended use

Primary intended uses

The primary use of LLaVA is research on large multimodal models and chatbots.

Primary intended users

The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

558K filtered image - text pairs from LAION/CC/SBU, captioned by BLIP.
158K GPT - generated multimodal instruction - following data.
500K academic - task - oriented VQA data mixture.
50K GPT - 4V data mixture.
40K ShareGPT data.

Evaluation dataset

A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction - following LMMs.

🔧 Technical Details

This section is not provided in the original document, so it is skipped.

📄 License

Where to send questions or comments about the model: [https://github.com/haotian - liu/LLaVA/issues](https://github.com/haotian - liu/LLaVA/issues)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご