LLaVA - v1.6 - 34B Open-Source Multimodal Chatbot - Free Image-Text Interactive Chatting

Llava V1.6 34b

Developed by liuhaotian

LLaVA is an open-source multimodal chatbot, fine-tuned based on a large language model, supporting interactions with both images and text.

Text-to-Image

Safetensors

Open Source License:Apache-2.0 #Multimodal Instruction Following #For Academic Research Only #GPT-4V Mixed Training

Downloads 9,033

Release Time : 1/31/2024

Model Overview

LLaVA is an autoregressive language model based on the Transformer architecture, fine-tuned with multimodal instruction-following data, primarily used for academic research on large multimodal models and chatbots.

Model Features

Multimodal Support

Supports interactions with images and text, capable of understanding and generating text responses based on image content.

Open-source

The model is fully open-source, facilitating research and customization.

Instruction Following

Fine-tuned with multimodal instruction-following data, enabling better understanding and execution of complex instructions.

Model Capabilities

Image caption generation

Multimodal dialogue

Visual question answering

Instruction following

Use Cases

Academic Research

Multimodal Model Research

Used to study the performance and capabilities of multimodal models.

Chatbot Development

Serves as a foundational model for developing multimodal chatbots.

Education

Visual Question Answering System

Used in educational settings for visual question answering systems to help students understand image content.

🚀 LLaVA Model Card

LLaVA is an open - source chatbot model. It fine - tunes a large language model on multimodal instruction - following data, offering great potential for research in large multimodal models and chatbots.

🚀 Quick Start

This section provides a high - level overview of the LLaVA model. For more detailed information, please refer to the subsequent sections.

✨ Features

LLaVA is an auto - regressive language model based on the transformer architecture.
It is fine - tuned on a large amount of multimodal data, enabling it to handle image - text - to - text tasks effectively.

📚 Documentation

Model details

Property	Details
Model Type	LLaVA is an open - source chatbot trained by fine - tuning LLM on multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture. Base LLM: [NousResearch/Nous - Hermes - 2 - Yi - 34B](https://huggingface.co/NousResearch/Nous - Hermes - 2 - Yi - 34B)
Model Date	LLaVA - v1.6 - 34B was trained in December 2023.
Paper or resources for more information	[https://llava - vl.github.io/](https://llava - vl.github.io/)

License

[NousResearch/Nous - Hermes - 2 - Yi - 34B](https://huggingface.co/NousResearch/Nous - Hermes - 2 - Yi - 34B) license.
Where to send questions or comments about the model: [https://github.com/haotian - liu/LLaVA/issues](https://github.com/haotian - liu/LLaVA/issues)

Intended use

Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.
Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

558K filtered image - text pairs from LAION/CC/SBU, captioned by BLIP.
158K GPT - generated multimodal instruction - following data.
500K academic - task - oriented VQA data mixture.
50K GPT - 4V data mixture.
40K ShareGPT data.

Evaluation dataset

A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction - following LMMs.

📄 License

The model uses the [NousResearch/Nous - Hermes - 2 - Yi - 34B](https://huggingface.co/NousResearch/Nous - Hermes - 2 - Yi - 34B) license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご