LLaVA-13b-delta-v0 Open Source Chatbot - Supports Natural Conversation Experience Based on Multi-modal Data

Llava 13b Delta V0

Developed by liuhaotian

LLaVA is an open-source chatbot fine-tuned with GPT-generated multimodal instruction-following data based on LLaMA/Vicuna, belonging to a Transformer-based autoregressive language model.

Text-to-Image

Transformers

Open Source License:Apache-2.0 #Multimodal Instruction Following #Vision-Language Joint Reasoning #For Academic Research Only

Downloads 352

Release Time : 4/17/2023

Model Overview

LLaVA is a multimodal large model combining vision and language processing capabilities, primarily used for academic research in multimodal large models and chatbots.

Model Features

Multimodal Capability

Combines vision and language processing to understand and generate text related to images.

Instruction Following

Fine-tuned with GPT-generated multimodal instruction-following data for better understanding and execution of complex instructions.

Open Source

Open-sourced under Apache 2.0 license, facilitating academic research and secondary development.

Model Capabilities

Multimodal instruction following

Visual reasoning

Scientific Q&A

Image caption generation

Complex reasoning

Use Cases

Academic Research

Multimodal Large Model Research

Used to study the performance and capabilities of multimodal large models.

Visual Reasoning

Used to evaluate the model's performance on visual reasoning tasks.

In collaboration with GPT-4, this model achieved state-of-the-art performance on the ScienceQA dataset.

Education

Scientific Q&A

Used for scientific Q&A tasks in education.

🚀 LLaVA Model Card

LLaVA is an open - source chatbot, which provides valuable insights for research on large multimodal models and chatbots.

🚀 Quick Start

⚠️ Important Note

This "delta model" cannot be used directly. Users have to apply it on top of the original LLaMA weights to get actual LLaVA weights. See https://github.com/haotian-liu/LLaVA#llava-weights for instructions.

✨ Features

LLaVA is an open - source chatbot trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture.

📚 Documentation

Model Details

Property	Details
Model Type	LLaVA is an open - source chatbot trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture.
Model Date	LLaVA was trained in April 2023.
Paper or Resources	https://llava - vl.github.io/
License	Apache License 2.0
Question or Comment Channel	https://github.com/haotian - liu/LLaVA/issues

Intended Use

Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training Dataset

595K filtered image - text pairs from CC3M.
150K GPT - generated multimodal instruction - following data.

Evaluation Dataset

A preliminary evaluation of the model quality is conducted by creating a set of 90 visual reasoning questions from 30 unique images randomly sampled from COCO val 2014 and each is associated with three types of questions: conversational, detailed description, and complex reasoning. We utilize GPT - 4 to judge the model outputs. We also evaluate our model on the ScienceQA dataset. Our synergy with GPT - 4 sets a new state - of - the - art on the dataset. See https://llava - vl.github.io/ for more details.

📄 License

Apache License 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご