LLaVA Open-Source Multimodal Chatbot - Free to Use, Supports Various Types of Conversations!

Llava V1.5 Mlp2x 336px Pretrain Vicuna 7b V1.5

Developed by liuhaotian

LLaVA is an open-source multimodal chatbot, fine-tuned based on LLaMA/Vicuna and trained with GPT-generated multimodal instruction-following data.

Downloads 173

Release Time : 10/5/2023

Model Overview

LLaVA is an autoregressive language model based on the Transformer architecture, primarily used for research on large multimodal models and chatbots.

Multimodal Capability

Combines visual and language understanding to process both image and text inputs.

Instruction Following

Capable of understanding and executing complex multimodal instructions.

Open-source

The model is fully open-source and available for research and development.

Image understanding

Visual question answering

Multimodal dialogue

Instruction following

Research

Multimodal Model Research

Used for research at the intersection of computer vision and natural language processing.

Application Development

Intelligent Chatbot

Develop intelligent dialogue systems capable of understanding image content.

Property	Details
Model Type	LLaVA is an open - source chatbot trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture.
Model Date	LLaVA - v1.5 - MLP2x - 336px - Pretrain - Vicuna - 7B - v1.5 was trained in September 2023.
Paper or Resources for More Information	https://llava-vl.github.io/

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base