L

Llava V1.5 Mlp2x 336px Pretrain Vicuna 13b V1.5

Developed by liuhaotian
LLaVA is an open-source multimodal chatbot, fine-tuned on GPT-generated multimodal instruction-following data based on LLaMA/Vicuna.
Downloads 66
Release Time : 10/5/2023

Model Overview

LLaVA is an autoregressive language model based on the Transformer architecture, primarily used for research on large multimodal models and chatbots.

Model Features

Multimodal Capability
Combines visual and language understanding to process both image and text inputs
Instruction Following
Fine-tuned to understand and execute complex multimodal instructions
Open-source and Extensible
Built on open-source models, facilitating research and extension

Model Capabilities

Image understanding
Visual question answering
Image caption generation
Multimodal dialogue
Instruction following

Use Cases

Research
Multimodal Model Research
Used to explore the capabilities and limitations of vision-language models
Human-Computer Interaction Research
Research on vision-based dialogue systems
Application Development
Intelligent Assistant
Develop smart conversational assistants capable of understanding image content
Educational Tools
Create educational applications that can explain image content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase