LLaVA-7b-delta-v0 Open-Source Chatbot - Freely Support Visual and Linguistic Multimodal Interaction

Llava 7b Delta V0

Developed by liuhaotian

LLaVA is an open-source chatbot fine-tuned with GPT-generated multimodal instruction-following data based on LLaMA/Vicuna, supporting visual and language multimodal interactions.

Text-to-Image

Transformers

Open Source License:Apache-2.0 #Multimodal Instruction Following #Image-Text Dialogue Generation #For Academic Research

Downloads 131

Release Time : 4/30/2023

Model Overview

LLaVA is an open-source multimodal chatbot that combines visual and language processing capabilities, primarily used for academic research and multimodal interaction tasks.

Model Features

Multimodal Capability

Combines visual and language processing capabilities, supporting image and text interactions.

Instruction Following

Fine-tuned with GPT-generated multimodal instruction-following data, capable of understanding and executing complex multimodal instructions.

Open Source

Licensed under Apache 2.0, allowing free use and modification.

Model Capabilities

Visual question answering

Image caption generation

Multimodal dialogue

Complex reasoning

Use Cases

Academic Research

Multimodal Model Research

Used to study the performance of multimodal models combining vision and language.

Visual Question Answering System

Builds an image-based question-answering system supporting complex reasoning and detailed descriptions.

Collaboratively set a new record with GPT-4 in the ScienceQA dataset.

Education

Science Q&A Assistance

Used for answering scientific questions and knowledge transfer in educational settings.

🚀 LLaVA Model Card

LLaVA is an open - source chatbot for multimodal research, trained by fine - tuning LLaMA/Vicuna on specific data.

🚀 Quick Start

⚠️ Important Note

This "delta model" cannot be used directly. Users have to apply it on top of the original LLaMA weights to get actual LLaVA weights. See https://github.com/haotian-liu/LLaVA#llava-weights for instructions.

✨ Features

LLaVA is an open - source chatbot trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture.

📚 Documentation

Model Details

Property	Details
Model Type	LLaVA is an open - source chatbot trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture.
Model Date	LLaVA was trained in April 2023.
Paper or Resources for More Information	https://llava-vl.github.io/
License	Apache License 2.0
Where to Send Questions or Comments about the Model	https://github.com/haotian-liu/LLaVA/issues

Intended Use

Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training Dataset

595K filtered image - text pairs from CC3M.
150K GPT - generated multimodal instruction - following data.

Evaluation Dataset

A preliminary evaluation of the model quality is conducted by creating a set of 90 visual reasoning questions from 30 unique images randomly sampled from COCO val 2014 and each is associated with three types of questions: conversational, detailed description, and complex reasoning. We utilize GPT - 4 to judge the model outputs. We also evaluate our model on the ScienceQA dataset. Our synergy with GPT - 4 sets a new state - of - the - art on the dataset. See https://llava-vl.github.io/ for more details.

📄 License

Apache License 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご