Pixtral-Large-Instruct-2411-hf-quantized.w4a16 Open-Source Model - Free Deployment to Assist Visual-Text Multi-Modal Task Inference

Pixtral Large Instruct 2411 Hf Quantized.w4a16

Developed by RedHatAI

The INT4 weight quantization version of Pixtral-Large-Instruct-2411-hf, suitable for efficient inference and deployment of visual-text multimodal tasks

Image-to-Text

Safetensors

Supports Multiple LanguagesOpen Source License:Other #Visual-text multimodal #Efficient INT4 quantization inference #vLLM deployment optimization

Downloads 1,868

Release Time : 2/7/2025

Model Overview

A version optimized by INT4 weight quantization based on the Pixtral-Large-Instruct-2411-hf model, supporting efficient deployment through vLLM, specifically designed to handle complex tasks combining vision and text

Model Features

Efficient quantization

Adopt INT4 weight quantization and FP16 activation quantization to significantly reduce memory usage while maintaining model performance

vLLM optimization

Optimized for the vLLM inference engine, supporting efficient parallel computing and long sequence processing

Multimodal support

Process both image and text inputs simultaneously to achieve cross-modal understanding and generation

Model Capabilities

Visual question answering

Image content description

Multimodal instruction following

Document visual understanding

Chart data analysis

Use Cases

Education

Teaching material understanding

Analyze the图文 content in teaching materials and generate learning points

Content moderation

Violation content detection

Identify violation information in the combined content of images and text

🚀 Pixtral-Large-Instruct-2411-hf-quantized.w4a16

This is a quantized version of the neuralmagic/Pixtral-Large-Instruct-2411-hf model, offering efficient deployment and optimized performance for vision-text tasks.

🚀 Quick Start

This model is a quantized variant of neuralmagic/Pixtral-Large-Instruct-2411-hf. It can be efficiently deployed using the vLLM backend.

✨ Features

Model Architecture: Based on neuralmagic/Pixtral-Large-Instruct-2411-hf, it takes vision-text as input and generates text as output.
Model Optimizations:
- Weight quantization: INT4
- Activation quantization: FP16
Multilingual Support: Supports languages such as English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Russian, and Korean.

📦 Installation

No specific installation steps are provided in the original README. If you want to use this model, you need to ensure that the necessary dependencies such as vLLM are installed.

💻 Usage Examples

Basic Usage

from vllm.assets.image import ImageAsset
from vllm import LLM, SamplingParams

# prepare model
llm = LLM(
    model="neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16",
    trust_remote_code=True,
    max_model_len=4096,
    max_num_seqs=2,
)

# prepare inputs
question = "What is the content of this image?"
inputs = {
    "prompt": f"<|user|>\n<|image_1|>\n{question}<|end|>\n<|assistant|>\n",
    "multi_modal_data": {
        "image": ImageAsset("cherry_blossom").pil_image.convert("RGB")
    },
}

# generate response
print("========== SAMPLE GENERATION ==============")
outputs = llm.generate(inputs, SamplingParams(temperature=0.2, max_tokens=64))
print(f"PROMPT  : {outputs[0].prompt}")
print(f"RESPONSE: {outputs[0].outputs[0].text}")
print("==========================================")

Advanced Usage

This model can also be used for model creation and evaluation.

Model Creation

import requests
import torch
from PIL import Image
from transformers import AutoProcessor
from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.transformers import oneshot
from llmcompressor.transformers.tracing import TraceableLlavaForConditionalGeneration
from compressed_tensors.quantization import QuantizationArgs, QuantizationType, QuantizationStrategy, ActivationOrdering, QuantizationScheme

    
# Load model.
model_id = "neuralmagic/Pixtral-Large-Instruct-2411-hf"
model = TraceableLlavaForConditionalGeneration.from_pretrained(
    model_id, device_map="auto", torch_dtype="auto"
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

# Oneshot arguments
DATASET_ID = "flickr30k"
DATASET_SPLIT = {"calibration": "test[:512]"}
NUM_CALIBRATION_SAMPLES = 512
MAX_SEQUENCE_LENGTH = 2048
dampening_frac=0.01

# Define a oneshot data collator for multimodal inputs.
def data_collator(batch):
    assert len(batch) == 1
    return {
        "input_ids": torch.LongTensor(batch[0]["input_ids"]),
        "attention_mask": torch.tensor(batch[0]["attention_mask"]),
        "pixel_values": torch.tensor(batch[0]["pixel_values"]),
    }

recipe = GPTQModifier(
    targets="Linear",
    config_groups={
        "config_group": QuantizationScheme(
            targets=["Linear"],
            weights=QuantizationArgs(
                num_bits=4,
                type=QuantizationType.INT,
                strategy=QuantizationStrategy.GROUP,
                group_size=128,
                symmetric=True,
                dynamic=False,
                actorder=ActivationOrdering.WEIGHT,
            ),
        ),
    },
    sequential_targets=["MistralDecoderLayer"],
    ignore=["re:.*lm_head", "re:vision_tower.*", "re:multi_modal_projector.*"],
    update_size=NUM_CALIBRATION_SAMPLES,
    dampening_frac=dampening_frac,
)

SAVE_DIR=f"{model_id.split('/')[1]}-quantized.w4a16"

# Perform oneshot
oneshot(
    model=model,
    tokenizer=model_id,
    dataset=DATASET_ID,
    splits=DATASET_SPLIT,
    recipe=recipe,
    max_seq_length=MAX_SEQUENCE_LENGTH,
    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
    trust_remote_code_model=True,
    data_collator=data_collator,
    output_dir=SAVE_DIR
)

Model Evaluation

The model was evaluated using mistral-evals for vision-related tasks and using lm_evaluation_harness for select text-based benchmarks.

Vision Tasks

vllm serve neuralmagic/pixtral-12b-quantized.w8a8 --tensor_parallel_size 1 --max_model_len 25000 --trust_remote_code --max_num_seqs 8 --gpu_memory_utilization 0.9 --dtype float16 --limit_mm_per_prompt image=7

python -m eval.run eval_vllm \
        --model_name neuralmagic/pixtral-12b-quantized.w8a8 \
        --url http://0.0.0.0:8000 \
        --output_dir ~/tmp \
        --eval_name <vision_task_name>

Text-based Tasks - MMLU

lm_eval \
  --model vllm \
  --model_args pretrained="<model_name>",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=<n>,gpu_memory_utilization=0.8,enable_chunked_prefill=True,trust_remote_code=True \
  --tasks mmlu \
  --num_fewshot 5 \
  --batch_size auto \
  --output_path output_dir

Text-based Tasks - MGSM

lm_eval \
  --model vllm \
  --model_args pretrained="<model_name>",dtype=auto,max_model_len=4096,max_gen_toks=2048,max_num_seqs=128,tensor_parallel_size=<n>,gpu_memory_utilization=0.9 \
  --tasks mgsm_cot_native \
  --apply_chat_template \
  --num_fewshot 0 \
  --batch_size auto \
  --output_path output_dir

📚 Documentation

Model Overview

Property	Details
Model Type	neuralmagic/Pixtral-Large-Instruct-2411-hf
Input	Vision-Text
Output	Text
Weight quantization	INT4
Activation quantization	FP16
Release Date	2/24/2025
Version	1.0
Model Developers	Neural Magic

Accuracy

Category	Metric	neuralmagic/Pixtral-Large-Instruct-2411-hf	neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16	Recovery (%)
Vision	MMMU (val, CoT) explicit_prompt_relaxed_correctness	63.56	60.56	95.28%
Vision	VQAv2 (val) vqa_match	...	...	...
...	...	...	...	...

📄 License

This model is licensed under the Mistral AI Research License.

⚠️ Important Note

If you want to use a Mistral Model, a Derivative or an Output for any purpose that is not expressly authorized under this Agreement, you must request a license from Mistral AI, which Mistral AI may grant to you in Mistral AI's sole discretion. To discuss such a license, please contact Mistral AI via the website contact form: https://mistral.ai/contact/.

💡 Usage Tip

This model is only for research purposes. For more information on your rights and data handling, please see the privacy policy.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご