Pixtral 12b Nf4
A 4-bit quantized version based on the Mistral community's Pixtral-12B, focusing on image text-to-text tasks and supporting Chinese description generation.
Downloads 236
Release Time : 9/25/2024
Model Overview
This is a vision-language model quantized with NF4, capable of generating text descriptions based on input images. Implemented based on the Llava architecture, suitable for multimodal understanding tasks.
Model Features
4-bit quantization
Use BitsAndBytes for NF4 quantization, significantly reducing video memory requirements.
Multimodal understanding
Capable of processing both image and text inputs simultaneously to achieve visual-language interaction.
Efficient inference
Achieves a generation speed of 10 - 12 tokens per second on an RTX 4090.
Model Capabilities
Image description generation
Multimodal content understanding
Chinese text generation
Use Cases
Content creation
Automatic image annotation
Generate descriptive text for images.
Generate high-quality natural language descriptions.
Assistive tools
Visual impairment assistance
Convert visual content into text descriptions.
Featured Recommended AI Models