P

Pixtral 12b Nf4

Developed by SeanScripts
A 4-bit quantized version based on the Mistral community's Pixtral-12B, focusing on image text-to-text tasks and supporting Chinese description generation.
Downloads 236
Release Time : 9/25/2024

Model Overview

This is a vision-language model quantized with NF4, capable of generating text descriptions based on input images. Implemented based on the Llava architecture, suitable for multimodal understanding tasks.

Model Features

4-bit quantization
Use BitsAndBytes for NF4 quantization, significantly reducing video memory requirements.
Multimodal understanding
Capable of processing both image and text inputs simultaneously to achieve visual-language interaction.
Efficient inference
Achieves a generation speed of 10 - 12 tokens per second on an RTX 4090.

Model Capabilities

Image description generation
Multimodal content understanding
Chinese text generation

Use Cases

Content creation
Automatic image annotation
Generate descriptive text for images.
Generate high-quality natural language descriptions.
Assistive tools
Visual impairment assistance
Convert visual content into text descriptions.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase