L

Llava 13b V0 4bit 128g

Developed by wojtab
LLaVA is a multimodal model combining vision and language, based on the LLaMA architecture, supporting image understanding and dialogue generation.
Downloads 167
Release Time : 4/21/2023

Model Overview

LLaVA-13b-delta-v0 is a vision-language model based on LLaMA-13B, utilizing 4-bit quantization to reduce memory consumption, suitable for multimodal dialogue and image understanding tasks.

Model Features

4-bit quantization
Implements 4-bit quantization via GPTQ technology, significantly reducing VRAM requirements and improving inference efficiency.
Multimodal support
Combines visual encoder and language model for joint understanding of images and text.
Open-source integration
Supports running via the llava extension in text-generation-webui, facilitating deployment and testing.

Model Capabilities

Image captioning
Multimodal dialogue
Visual question answering
Context understanding

Use Cases

Human-computer interaction
Image-based dialogue assistant
After users upload images, the model can answer questions about the image content or generate descriptions.
Enables natural multi-turn interactive dialogue
Content generation
Automatic image captioning
Generates detailed text descriptions for unlabeled images.
Improves image retrieval and classification efficiency
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase