Llava V1.5 Mlp2x 336px Pretrain Vicuna 7b V1.5
LLaVA is an open-source multimodal chatbot, fine-tuned based on LLaMA/Vicuna and trained with GPT-generated multimodal instruction-following data.
Downloads 173
Release Time : 10/5/2023
Model Overview
LLaVA is an autoregressive language model based on the Transformer architecture, primarily used for research on large multimodal models and chatbots.
Model Features
Multimodal Capability
Combines visual and language understanding to process both image and text inputs.
Instruction Following
Capable of understanding and executing complex multimodal instructions.
Open-source
The model is fully open-source and available for research and development.
Model Capabilities
Image understanding
Visual question answering
Multimodal dialogue
Instruction following
Use Cases
Research
Multimodal Model Research
Used for research at the intersection of computer vision and natural language processing.
Application Development
Intelligent Chatbot
Develop intelligent dialogue systems capable of understanding image content.
Featured Recommended AI Models