Llava 13b Delta V0
LLaVA is an open-source chatbot fine-tuned with GPT-generated multimodal instruction-following data based on LLaMA/Vicuna, belonging to a Transformer-based autoregressive language model.
Downloads 352
Release Time : 4/17/2023
Model Overview
LLaVA is a multimodal large model combining vision and language processing capabilities, primarily used for academic research in multimodal large models and chatbots.
Model Features
Multimodal Capability
Combines vision and language processing to understand and generate text related to images.
Instruction Following
Fine-tuned with GPT-generated multimodal instruction-following data for better understanding and execution of complex instructions.
Open Source
Open-sourced under Apache 2.0 license, facilitating academic research and secondary development.
Model Capabilities
Multimodal instruction following
Visual reasoning
Scientific Q&A
Image caption generation
Complex reasoning
Use Cases
Academic Research
Multimodal Large Model Research
Used to study the performance and capabilities of multimodal large models.
Visual Reasoning
Used to evaluate the model's performance on visual reasoning tasks.
In collaboration with GPT-4, this model achieved state-of-the-art performance on the ScienceQA dataset.
Education
Scientific Q&A
Used for scientific Q&A tasks in education.
Featured Recommended AI Models
Š 2025AIbase