LLaVA-Lightning-7B-delta-v1-1 Open-Source Chatbot - Free to Use and Support Multi-Modal Conversation

Llava Lightning 7B Delta V1 1

Developed by liuhaotian

LLaVA is an open-source chatbot fine-tuned with GPT-generated multimodal instruction-following data based on LLaMA/Vicuna

Text-to-Image

Transformers

Open Source License:Apache-2.0 #Multimodal Instruction Following #Vision-Language Integration #For Academic Research

Downloads 699

Release Time : 5/3/2023

Model Overview

A multimodal large model combining vision and language understanding, primarily used for multimodal interaction and instruction-following tasks in academic research

Model Features

Multimodal Fusion

Combines visual and language understanding capabilities to process joint inputs of images and text

Instruction Following

Fine-tuned with GPT-generated instruction data, capable of following complex multimodal instructions

Lightweight Training

The Lightning version is optimized for training, making it more efficient compared to the original version

Model Capabilities

Image understanding

Visual question answering

Multimodal dialogue

Image caption generation

Complex visual reasoning

Use Cases

Academic Research

Multimodal Interaction Research

Used to explore interaction methods combining vision and language models

Visual Reasoning Benchmark Testing

Evaluates multimodal understanding capabilities on datasets like ScienceQA

Collaborates with GPT-4 to achieve state-of-the-art performance

🚀 LLaVA Model Card

LLaVA is an open - source chatbot that combines language and visual understanding, offering new possibilities for multimodal research.

🚀 Quick Start

⚠️ Important Note

This "delta model" cannot be used directly. Users have to apply it on top of the original LLaMA weights to get actual LLaVA weights. See https://github.com/haotian - liu/LLaVA#llava - weights for instructions.

✨ Features

LLaVA is an open - source chatbot trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture.

📚 Documentation

Model Details

Property	Details
Model Type	LLaVA is an open - source chatbot trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture.
Model Date	LLaVA - Lightning was trained in May 2023.
Paper or Resources	https://llava - vl.github.io/
License	Apache License 2.0
Query Channel	https://github.com/haotian - liu/LLaVA/issues

Intended Use

Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training Dataset

558K filtered image - text pairs from LAION/CC/SBU, captioned by BLIP. 80K GPT - generated multimodal instruction - following data.

Evaluation Dataset

A preliminary evaluation of the model quality is conducted by creating a set of 90 visual reasoning questions from 30 unique images randomly sampled from COCO val 2014 and each is associated with three types of questions: conversational, detailed description, and complex reasoning. We utilize GPT - 4 to judge the model outputs. We also evaluate our model on the ScienceQA dataset. Our synergy with GPT - 4 sets a new state - of - the - art on the dataset. See https://llava - vl.github.io/ for more details.

📄 License

Apache License 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご