Qwen2.5-VL-32B-Instruct-FP8-Dynamic Open-source Model - Supports visual and text input, the top choice for efficient inference!

Qwen2.5 VL 32B Instruct FP8 Dynamic

Developed by BCCard

An FP8 quantized version based on the Qwen2.5-VL-32B-Instruct model, supporting visual-text input and text output, suitable for efficient inference scenarios.

Image-to-Text

Transformers

EnglishOpen Source License:Apache-2.0 #FP8 Quantization #Multimodal Inference #Visual-Text Input

Downloads 140

Release Time : 5/8/2025

Model Overview

This is a vision-language model capable of processing image and text inputs and generating text outputs. Through FP8 quantization optimization, the inference efficiency is improved.

Model Features

FP8 Quantization

Adopt the FP8 data type for weight and activation quantization to improve inference efficiency

Multimodal Support

Support visual and text inputs, capable of understanding image content and generating relevant text

Efficient Inference

Achieve efficient deployment and inference through the vLLM backend

Model Capabilities

Image Content Understanding

Multimodal Text Generation

Visual Question Answering

Use Cases

Content Understanding

Image Description Generation

Generate descriptive text based on the input image

Intelligent Question Answering

Visual Question Answering

Answer natural language questions about the image content

🚀 Qwen2.5-VL-32B-Instruct-FP8-Dynamic

This is a quantized version of Qwen/Qwen2.5-VL-32B-Instruct. It offers optimized performance through specific quantization techniques and is well - suited for vision - text input scenarios.

✨ Features

Model Overview

Model Architecture: Qwen2.5-VL-72B-Instruct
- Input: Vision - Text
- Output: Text
Model Optimizations:
- Weight quantization: FP8
- Activation quantization: FP8
Release Date: 2/24/2025
Version: 1.0
Model Developers: Neural Magic

Model Optimizations

This model was obtained by quantizing the weights of Qwen/Qwen2.5-VL-32B-Instruct to the FP8 data type. It is ready for inference with vLLM >= 0.5.2.

📦 Installation

No specific installation steps were provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

This model can be deployed efficiently using the vLLM backend, as shown in the example below.

from vllm.assets.image import ImageAsset
from vllm import LLM, SamplingParams

# prepare model
llm = LLM(
    model="neuralmagic/Qwen2.5-VL-72B-Instruct-FP8-Dynamic",
    trust_remote_code=True,
    max_model_len=4096,
    max_num_seqs=2,
)

# prepare inputs
question = "What is the content of this image?"
inputs = {
    "prompt": f"<|user|>\n<|image_1|>\n{question}<|end|>\n<|assistant|>\n",
    "multi_modal_data": {
        "image": ImageAsset("cherry_blossom").pil_image.convert("RGB")
    },
}

# generate response
print("========== SAMPLE GENERATION ==============")
outputs = llm.generate(inputs, SamplingParams(temperature=0.2, max_tokens=64))
print(f"PROMPT  : {outputs[0].prompt}")
print(f"RESPONSE: {outputs[0].outputs[0].text}")
print("==========================================")

vLLM also supports OpenAI - compatible serving. See the documentation for more details.

📚 Documentation

No detailed documentation other than the usage example was provided in the original document, so this section is skipped.

🔧 Technical Details

No specific technical details were provided in the original document, so this section is skipped.

📄 License

This model is licensed under the Apache 2.0 license.

Information Table

Property	Details
Base Model	Qwen/Qwen2.5-VL-72B-Instruct
Library Name	transformers
Model Type	Qwen2.5-VL-32B-Instruct-FP8-Dynamic
License	Apache 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご