Open-source Multimodal Chatbot LLaVA - Offers Free Deployment of Versions with Various Volume-Quality Balances

Llava V1.6 Vicuna 13b Gguf

Developed by cjpais

LLaVA is an open-source multimodal chatbot based on the Transformer architecture, offering various model versions that balance size and quality through quantization techniques.

Image-to-Text Open Source License:Apache-2.0 #Multimodal Dialogue #Image Understanding #Low-resource Deployment

Downloads 630

Release Time : 2/17/2024

Model Overview

LLaVA is an open-source chatbot trained by fine-tuning LLMs on multimodal instruction-following data, supporting image-to-text and text-to-text tasks.

Model Features

Multimodal Capability

Combines visual and language understanding to handle interactive tasks involving images and text.

Quantization Options

Offers multiple quantization versions from 3-bit to 8-bit, balancing model size and inference quality.

Instruction Following

Fine-tuned with extensive instruction data to better understand and execute complex instructions.

Model Capabilities

Image Understanding

Multimodal Dialogue

Visual Question Answering

Instruction Following

Use Cases

Research

Multimodal Model Research

Used for research in the intersection of computer vision and natural language processing.

Application Development

Intelligent Chatbot

Develop dialogue systems capable of understanding image content.

🚀 GGUF Quantized LLaVA 1.6 Vicuna 13B

This project provides GGUF quantized versions of LLaVA 1.6 Vicuna 13B, updated with quants and projector from PR #5267.

📚 Documentation

Updated Quantized Models

The quants and projector are updated from PR #5267. The following table shows different quantized models and their details:

Name	Quant method	Bits	Size	Use case
llava-v1.6-vicuna-13b.Q3_K_XS.gguf	Q3_K_XS	3	5.31 GB	very small, high quality loss
llava-v1.6-vicuna-13b.Q3_K_M.gguf	Q3_K_M	3	6.34 GB	very small, high quality loss
llava-v1.6-vicuna-13b.Q4_K_M.gguf	Q4_K_M	4	7.87 GB	medium, balanced quality - recommended
llava-v1.6-vicuna-13b.Q5_K_S.gguf	Q5_K_S	5	8.97 GB	large, low quality loss - recommended
llava-v1.6-vicuna-13b.Q5_K_M.gguf	Q5_K_M	5	9.23 GB	large, very low quality loss - recommended
llava-v1.6-vicuna-13b.Q6_K.gguf	Q6_K	5	10.7 GB	very large, extremely low quality loss
llava-v1.6-vicuna-13b.Q8_0.gguf	Q8_0	5	13.8 GB	very large, extremely low quality loss - not recommended

ORIGINAL LLaVA Model Card

Model details

Property	Details
Model Type	LLaVA is an open - source chatbot trained by fine - tuning LLM on multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture. Base LLM: lmsys/vicuna-13b-v1.5
Model Date	LLaVA-v1.6-Vicuna-13B was trained in December 2023.
Paper or resources for more information	https://llava-vl.github.io/

License

⚠️ Important Note

If you have questions or comments about the model, please send them to https://github.com/haotian-liu/LLaVA/issues

Intended use

Property	Details
Primary intended uses	The primary use of LLaVA is research on large multimodal models and chatbots.
Primary intended users	The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

558K filtered image - text pairs from LAION/CC/SBU, captioned by BLIP.
158K GPT - generated multimodal instruction - following data.
500K academic - task - oriented VQA data mixture.
50K GPT - 4V data mixture.
40K ShareGPT data.

Evaluation dataset

A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction - following LMMs.

📄 License

This project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご