Open-source multimodal chatbot llava-v1.6-vicuna-7b-gguf - Free deployment, multiple quantization options available

Llava V1.6 Vicuna 7b Gguf

Developed by cjpais

LLaVA is an open-source multimodal chatbot trained by fine-tuning LLM on multimodal instruction-following data. This version is the GGUF quantized version, offering multiple quantization options.

Text-to-Image Open Source License:Apache-2.0 #Multimodal Dialogue #Image-Text Generation #Low-Resource Deployment

Downloads 493

Release Time : 2/17/2024

Model Overview

LLaVA is an autoregressive language model based on the Transformer architecture, capable of processing both image and text inputs to generate text outputs. Primarily used for research on large multimodal models and chatbots.

Model Features

Multimodal Capability

Can process both image and text inputs to generate relevant text outputs

Multiple Quantization Options

Offers various quantized versions from 3-bit to 8-bit to meet different hardware and performance needs

Open Source

Licensed under Apache-2.0, allowing free use and modification

Model Capabilities

Image Understanding

Text Generation

Multimodal Dialogue

Visual Question Answering

Use Cases

Research

Multimodal Model Research

Used for research at the intersection of computer vision and natural language processing

Application Development

Intelligent Chatbot

Develop dialogue systems capable of understanding image content

🚀 GGUF Quantized LLaVA 1.6 Vicuna 7B

This project provides GGUF quantized versions of the LLaVA 1.6 Vicuna 7B model, updated with new quants and projector.

🚀 Quick Start

The GGUF Quantized LLaVA 1.6 Vicuna 7B model has been updated with quants and projector from PR #5267.

✨ Features

Offers multiple quantization methods for different use cases.
Based on the LLaVA open - source chatbot fine - tuned on multimodal instruction - following data.

📦 Provided files

Name	Quant method	Bits	Size	Use case
llava-v1.6-vicuna-7b.Q3_K_XS.gguf	Q3_K_XS	3	2.77 GB	very small, high quality loss
llava-v1.6-vicuna-7b.Q3_K_M.gguf	Q3_K_M	3	3.3 GB	very small, high quality loss
llava-v1.6-vicuna-7b.Q4_K_M.gguf	Q4_K_M	4	4.08 GB	medium, balanced quality - recommended
llava-v1.6-vicuna-7b.Q5_K_S.gguf	Q5_K_S	5	4.65 GB	large, low quality loss - recommended
llava-v1.6-vicuna-7b.Q5_K_M.gguf	Q5_K_M	5	4.78 GB	large, very low quality loss - recommended
llava-v1.6-vicuna-7b.Q6_K.gguf	Q6_K	6	5.53 GB	very large, extremely low quality loss
llava-v1.6-vicuna-7b.Q8_0.gguf	Q8_0	8	7.16 GB	very large, extremely low quality loss - not recommended

📚 Documentation

ORIGINAL LLaVA Model Card

Model details

Property	Details
Model Type	LLaVA is an open - source chatbot trained by fine - tuning LLM on multimodal instruction - following data. It is an auto - regressive language model, based on the transformer architecture. Base LLM: lmsys/vicuna-7b-v1.5
Model Date	LLaVA - v1.6 - Vicuna - 7B was trained in December 2023.
Paper or resources for more information	https://llava-vl.github.io/

License

Where to send questions or comments about the model: https://github.com/haotian-liu/LLaVA/issues

Intended use

Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

558K filtered image - text pairs from LAION/CC/SBU, captioned by BLIP.
158K GPT - generated multimodal instruction - following data.
500K academic - task - oriented VQA data mixture.
50K GPT - 4V data mixture.
40K ShareGPT data.

Evaluation dataset

A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction - following LMMs.

📄 License

The project is under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご