llava-v1.6-mistral-7b Open-source Multimodal Chatbot - Free to Use for Diverse Information Interaction

Llava V1.6 Mistral 7b

Developed by liuhaotian

LLaVA is an open-source multimodal chatbot, trained by fine-tuning large language models on multimodal instruction-following data.

Text-to-Image

Transformers

Open Source License:Apache-2.0 #Multimodal Dialogue #Visual Question Answering #Instruction Following

Downloads 27.45k

Release Time : 1/31/2024

Model Overview

LLaVA is an autoregressive language model based on the transformer architecture, capable of processing both image and text inputs to generate text outputs.

Model Features

Multimodal Capability

Can process both image and text inputs to generate relevant text outputs.

Instruction Following

Specifically trained on multimodal instruction-following data, capable of understanding and executing complex instructions.

Open-Source Model

Fully open-source, allowing researchers and developers to freely use and modify it.

Model Capabilities

Image Understanding

Multimodal Dialogue

Visual Question Answering

Instruction Following

Text Generation

Use Cases

Research

Multimodal Model Research

Used to study the behavior and capabilities of large multimodal models.

Education

Visual-Assisted Learning

Helps students learn knowledge through interactive image and text methods.

🚀 LLaVA Model Card

LLaVA is an open - source chatbot that offers research value in large multimodal models and chatbots.

🚀 Quick Start

This README provides detailed information about the LLaVA model, including its details, license, intended use, training dataset, and evaluation dataset.

✨ Features

LLaVA is an auto - regressive language model based on the transformer architecture. It is trained by fine - tuning LLM on multimodal instruction - following data, enabling it to handle image - text - to - text tasks.

📚 Documentation

Model details

Property	Details
Model Type	LLaVA is an open - source chatbot trained by fine - tuning LLM on multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture. Base LLM: [mistralai/Mistral - 7B - Instruct - v0.2](https://huggingface.co/mistralai/Mistral - 7B - Instruct - v0.2)
Model Date	LLaVA - v1.6 - Mistral - 7B was trained in December 2023.
Paper or resources for more information	[https://llava - vl.github.io/](https://llava - vl.github.io/)

License

[mistralai/Mistral - 7B - Instruct - v0.2](https://huggingface.co/mistralai/Mistral - 7B - Instruct - v0.2) license.

⚠️ Important Note

For questions or comments about the model, please visit [https://github.com/haotian - liu/LLaVA/issues](https://github.com/haotian - liu/LLaVA/issues).

Intended use

Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

558K filtered image - text pairs from LAION/CC/SBU, captioned by BLIP.
158K GPT - generated multimodal instruction - following data.
500K academic - task - oriented VQA data mixture.
50K GPT - 4V data mixture.
40K ShareGPT data.

Evaluation dataset

A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction - following LMMs.

📄 License

The model uses the [mistralai/Mistral - 7B - Instruct - v0.2](https://huggingface.co/mistralai/Mistral - 7B - Instruct - v0.2) license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご