llava-llama-2-13b-chat-lightning-preview Open-source Multimodal Chatbot - Free Support for Image-Text Interactive Conversations

Llava Llama 2 13b Chat Lightning Preview

Developed by liuhaotian

LLaVA is an open-source multimodal chatbot model based on the Transformer architecture, obtained by fine-tuning LLaMA/Vicuna on multimodal instruction-following data generated by GPT.

Text-to-Image

Transformers

#Multimodal dialogue #Visual reasoning #Instruction fine-tuning

Downloads 2,122

Release Time : 7/19/2023

Model Overview

LLaVA is mainly used for research on multimodal large models and chatbots. It supports multimodal processing capabilities for images and text, providing support for research in fields such as computer vision and natural language processing.

Model Features

Multimodal capabilities

Fine-tuned on multimodal instruction-following data generated by GPT, with multimodal capabilities for processing images and text.

Transformer architecture

Adopts an autoregressive language model and is built based on the Transformer architecture.

Open-source research support

Provides open-source support for researchers and enthusiasts in fields such as computer vision and natural language processing.

Model Capabilities

Image understanding

Text generation

Visual reasoning

Multimodal dialogue

Use Cases

Academic research

Multimodal model research

Used to study the multimodal interaction capabilities of images and text.

Visual reasoning tasks

Evaluated on the ScienceQA dataset, achieving a new state-of-the-art level in collaboration with GPT-4.

Achieved the best performance on the ScienceQA dataset

Application development

Intelligent chatbot

Develop a chatbot with image understanding and dialogue capabilities.

🚀 LLaVA Model Card

LLaVA is an open - source chatbot designed for research on large multimodal models and chatbots. It offers valuable insights for researchers and hobbyists in related fields.

✨ Features

LLaVA is an auto - regressive language model based on the transformer architecture. It was created by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data.

📦 Installation

No installation steps were provided in the original document, so this section is skipped.

📚 Documentation

Model Details

Property	Details
Model Type	LLaVA is an open - source chatbot trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture.
Model Date	LLaVA - LLaMA - 2 - 13B - Chat - Preview was trained in July 2023.
Paper or Resources for More Information	https://llava-vl.github.io/

License

Where to send questions or comments about the model: https://github.com/haotian-liu/LLaVA/issues

Intended Use

Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training Dataset

558K filtered image - text pairs from LAION/CC/SBU, captioned by BLIP.
80K GPT - generated multimodal instruction - following data.

Evaluation Dataset

A preliminary evaluation of the model quality is conducted by creating a set of 90 visual reasoning questions from 30 unique images randomly sampled from COCO val 2014 and each is associated with three types of questions: conversational, detailed description, and complex reasoning. We utilize GPT - 4 to judge the model outputs. We also evaluate our model on the ScienceQA dataset. Our synergy with GPT - 4 sets a new state - of - the - art on the dataset. See https://llava-vl.github.io/ for more details.

🔧 Technical Details

No specific technical implementation details were provided in the original document, so this section is skipped.

📄 License

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご