Llava 13b V0 4bit 128g
LLaVA is a multimodal model combining vision and language, based on the LLaMA architecture, supporting image understanding and dialogue generation.
Downloads 167
Release Time : 4/21/2023
Model Overview
LLaVA-13b-delta-v0 is a vision-language model based on LLaMA-13B, utilizing 4-bit quantization to reduce memory consumption, suitable for multimodal dialogue and image understanding tasks.
Model Features
4-bit quantization
Implements 4-bit quantization via GPTQ technology, significantly reducing VRAM requirements and improving inference efficiency.
Multimodal support
Combines visual encoder and language model for joint understanding of images and text.
Open-source integration
Supports running via the llava extension in text-generation-webui, facilitating deployment and testing.
Model Capabilities
Image captioning
Multimodal dialogue
Visual question answering
Context understanding
Use Cases
Human-computer interaction
Image-based dialogue assistant
After users upload images, the model can answer questions about the image content or generate descriptions.
Enables natural multi-turn interactive dialogue
Content generation
Automatic image captioning
Generates detailed text descriptions for unlabeled images.
Improves image retrieval and classification efficiency
Featured Recommended AI Models