Llava-v1.5-13b Open-Source Multimodal Chatbot - Supports Image-Text Interaction and Free Deployment

Llava V1.5 13b

Developed by liuhaotian

LLaVA is an open-source multimodal chatbot, fine-tuned based on LLaMA/Vicuna and integrated with visual capabilities, supporting interactions with both images and text.

Text-to-Image

Transformers

#Multimodal Dialogue #Visual Question Answering #Instruction Following

Downloads 98.17k

Release Time : 10/5/2023

Model Overview

LLaVA is a multimodal model combining visual and language understanding capabilities, capable of processing image and text inputs to generate natural language responses. Primarily used for research on large multimodal models and chatbot applications.

Model Features

Multimodal Understanding

Processes both image and text inputs, understands visual content, and generates relevant responses

Instruction Following

Capable of executing tasks by following complex multimodal instructions

Large-scale Training Data

Trained on over a million multimodal data points, covering caption generation, instruction following, and VQA tasks

Model Capabilities

Image content understanding

Visual Question Answering

Multimodal dialogue

Image caption generation

Cross-modal reasoning

Use Cases

Academic Research

Multimodal Model Research

Used to explore joint visual-language representation learning

Outperforms in 12 benchmark tests

Educational Applications

Visual-assisted Learning

Explains complex concepts through image and text interactions

🚀 LLaVA Model Card

LLaVA is an open - source chatbot. It offers valuable insights and capabilities for research in large multimodal models and chatbot development.

🚀 Quick Start

This section provides a high - level overview of the LLaVA model. For detailed information, please refer to the following sections.

✨ Features

LLaVA is an auto - regressive language model based on the transformer architecture. It is trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data.

📚 Documentation

Model details

Property	Details
Model Type	LLaVA is an open - source chatbot trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture.
Model Date	LLaVA - v1.5 - 13B was trained in September 2023.
Paper or Resources for More Information	[https://llava - vl.github.io/](https://llava - vl.github.io/)

License

Where to send questions or comments about the model: [https://github.com/haotian - liu/LLaVA/issues](https://github.com/haotian - liu/LLaVA/issues)

Intended use

Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

558K filtered image - text pairs from LAION/CC/SBU, captioned by BLIP.
158K GPT - generated multimodal instruction - following data.
450K academic - task - oriented VQA data mixture.
40K ShareGPT data.

Evaluation dataset

A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction - following LMMs.

📄 License

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご