LLaVA-v1.6-Vicuna-7B Open-source Multimodal Chatbot - Free to Use for Cross-modal Communication

Llava V1.6 Vicuna 7b

Developed by liuhaotian

LLaVA is an open-source multimodal chatbot, fine-tuned on large language models using multimodal instruction-following data.

Text-to-Image

Transformers

#Multimodal Instruction Following #Academic VQA Tasks #GPT-4V Hybrid Training

Downloads 31.65k

Release Time : 1/31/2024

Model Overview

LLaVA is primarily used for academic research on large multimodal models and chatbots, supporting multimodal interactions between images and text.

Model Features

Multimodal Capability

Supports joint understanding and generation of images and text, capable of handling complex multimodal instructions.

Open-source Model

Fully open-source, facilitating secondary development and academic research by researchers.

Large-scale Training Data

Trained on over 1.2M multimodal training data, including image-text pairs and instruction-following data.

Model Capabilities

Image understanding

Multimodal dialogue

Visual question answering

Instruction following

Text generation

Use Cases

Academic Research

Multimodal Model Research

Used to study the performance and capability boundaries of vision-language models.

Human-Computer Interaction Experiments

Serves as a foundational model for developing more intelligent chatbots.

Education

Visual-assisted Learning

Helps students learn complex concepts through interactive image and text methods.

🚀 LLaVA Model Card

LLaVA is an open - source chatbot that enables research on large multimodal models and chatbots, based on the transformer architecture.

🚀 Quick Start

This README provides detailed information about the LLaVA model, including its technical details, license, intended use, training and evaluation datasets.

✨ Features

LLaVA is an auto - regressive language model based on the transformer architecture.
It is fine - tuned on multimodal instruction - following data for research on large multimodal models and chatbots.

📚 Documentation

🔍 Model details

Property	Details
Model Type	LLaVA is an open - source chatbot trained by fine - tuning LLM on multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture. Base LLM: [lmsys/vicuna - 7b - v1.5](https://huggingface.co/lmsys/vicuna - 7b - v1.5)
Model Date	LLaVA - v1.6 - Vicuna - 7B was trained in December 2023.
Paper or Resources	[https://llava - vl.github.io/](https://llava - vl.github.io/)

📄 License

Where to send questions or comments about the model: [https://github.com/haotian - liu/LLaVA/issues](https://github.com/haotian - liu/LLaVA/issues)

🎯 Intended use

Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

📊 Training dataset

558K filtered image - text pairs from LAION/CC/SBU, captioned by BLIP.
158K GPT - generated multimodal instruction - following data.
500K academic - task - oriented VQA data mixture.
50K GPT - 4V data mixture.
40K ShareGPT data.

📈 Evaluation dataset

A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction - following LMMs.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご