L

Llava 13b Delta V0

Developed by liuhaotian
LLaVA is an open-source chatbot fine-tuned with GPT-generated multimodal instruction-following data based on LLaMA/Vicuna, belonging to a Transformer-based autoregressive language model.
Downloads 352
Release Time : 4/17/2023

Model Overview

LLaVA is a multimodal large model combining vision and language processing capabilities, primarily used for academic research in multimodal large models and chatbots.

Model Features

Multimodal Capability
Combines vision and language processing to understand and generate text related to images.
Instruction Following
Fine-tuned with GPT-generated multimodal instruction-following data for better understanding and execution of complex instructions.
Open Source
Open-sourced under Apache 2.0 license, facilitating academic research and secondary development.

Model Capabilities

Multimodal instruction following
Visual reasoning
Scientific Q&A
Image caption generation
Complex reasoning

Use Cases

Academic Research
Multimodal Large Model Research
Used to study the performance and capabilities of multimodal large models.
Visual Reasoning
Used to evaluate the model's performance on visual reasoning tasks.
In collaboration with GPT-4, this model achieved state-of-the-art performance on the ScienceQA dataset.
Education
Scientific Q&A
Used for scientific Q&A tasks in education.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase