Open-source Multimodal Chatbot Model LLaVA 1.6 34B - Supports Image-Text to Text Generation Tasks

Llava V1.6 34B Gguf

Developed by cjpais

LLaVA 1.6 34B is an open-source multimodal chatbot model developed by fine-tuning a large language model on multimodal instruction-following data. It supports image-to-text and text-to-text generation tasks.

Image-to-Text Open Source License:Apache-2.0 #Multimodal Dialogue #Visual Question Answering #34B Large Model

Downloads 1,965

Release Time : 2/1/2024

Model Overview

LLaVA is an autoregressive language model based on the Transformer architecture, primarily used for academic research in multimodal large models and chatbots.

Model Features

Multimodal Support

Capable of processing both image and text inputs to generate text outputs

Multiple Quantization Versions

Offers various quantization versions from 3-bit to 8-bit to meet different hardware requirements

High-Quality Fine-Tuning

Fine-tuned on extensive multimodal instruction-following data

Model Capabilities

Image Understanding

Multimodal Dialogue

Visual Question Answering

Image Caption Generation

Use Cases

Academic Research

Multimodal Model Research

Used for research in the intersection of computer vision and natural language processing

Application Development

Intelligent Chatbot

Develop dialogue systems capable of understanding image content

🚀 GGUF Quantized LLaVA 1.6 34B

This project offers quantized versions of LLaVA 1.6 34B, with updated quants and projector from PR #5267, facilitating research on large multimodal models and chatbots.

🚀 Quick Start

This README provides detailed information about the GGUF quantized LLaVA 1.6 34B model, including the provided files, model details, license, intended use, training dataset, and evaluation dataset.

📚 Documentation

Provided files

Name	Quant method	Bits	Size	Use case
llava-v1.6-34b.Q3_K_XS.gguf	Q3_K_XS	3	14.2 GB	very small, high quality loss
llava-v1.6-34b.Q3_K_M.gguf	Q3_K_M	3	16.7 GB	very small, high quality loss
llava-v1.6-34b.Q4_K_M.gguf	Q4_K_M	4	20.66 GB	medium, balanced quality - recommended
llava-v1.6-34b.Q5_K_S.gguf	Q5_K_S	5	23.7 GB	large, low quality loss - recommended
llava-v1.6-34b.Q5_K_M.gguf	Q5_K_M	5	24.3 GB	large, very low quality loss - recommended
llava-v1.6-34b.Q6_K.gguf	Q6_K	5	28.2 GB	very large, extremely low quality loss
llava-v1.6-34b.Q8_0.gguf	Q8_0	5	36.5 GB	very large, extremely low quality loss - not recommended

ORIGINAL LLaVA Model Card

Model details

Property	Details
Model Type	LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: NousResearch/Nous-Hermes-2-Yi-34B
Model Date	LLaVA-v1.6-34B was trained in December 2023.
Paper or Resources	https://llava-vl.github.io/

License

NousResearch/Nous-Hermes-2-Yi-34B license.

⚠️ Important Note

Questions or comments about the model can be sent to https://github.com/haotian-liu/LLaVA/issues.

Intended use

Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
158K GPT-generated multimodal instruction-following data.
500K academic-task-oriented VQA data mixture.
50K GPT-4V data mixture.
40K ShareGPT data.

Evaluation dataset

A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction-following LMMs.

📄 License

The model uses the NousResearch/Nous-Hermes-2-Yi-34B license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご