đ LLaVA Model Card
LLaVA is an open - source chatbot designed for research on large multimodal models and chatbots. It offers valuable insights for researchers and hobbyists in related fields.
đ Quick Start
This README provides detailed information about the LLaVA model, including its technical details, intended use, training and evaluation datasets, and license information.
⨠Features
- Multimodal Capability: LLaVA is trained on multimodal instruction - following data, enabling it to handle both images and text.
- Based on Transformer: It is an auto - regressive language model built on the transformer architecture.
đĻ Installation
This repo contains GGUF files that allow you to inference llava - v1.5 - 13b with llama.cpp end - to - end without any extra dependency.
đģ Usage Examples
There are no specific code examples provided in the original README.
đ Documentation
Model details
Property |
Details |
Model Type |
LLaVA is an open - source chatbot trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture. |
Model Date |
LLaVA - v1.5 - 13B was trained in September 2023. |
Paper or resources for more information |
[https://llava - vl.github.io/](https://llava - vl.github.io/) |
Intended use
Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.
Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
Training dataset
- 558K filtered image - text pairs from LAION/CC/SBU, captioned by BLIP.
- 158K GPT - generated multimodal instruction - following data.
- 450K academic - task - oriented VQA data mixture.
- 40K ShareGPT data.
Evaluation dataset
A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction - following LMMs.
Notes
â ī¸ Important Note
The mmproj - model - f16.gguf file structure is experimental and may change. Always use the latest code in llama.cpp.
đ§ Technical Details
LLaVA is an open - source chatbot trained by fine - tuning LLaMA/Vicuna on GPT - generated multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture.
đ License
Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.
Where to send questions or comments about the model: [https://github.com/haotian - liu/LLaVA/issues](https://github.com/haotian - liu/LLaVA/issues)
