# vLLM Optimization
Internvl3 38B FP8 Dynamic
MIT
This is the FP8 static quantization version of OpenGVLab/InternVL3-38B, optimized for high-performance inference using vLLM. It achieves approximately 2x acceleration on vision-language tasks with minimal accuracy loss.
Text-to-Image
Safetensors Supports Multiple Languages
I
ConfidentialMind
5,173
1
Gemma 3 12b It FP8 Dynamic
Apache-2.0
An FP8 quantized model based on google/gemma-3-12b-it, supporting visual-text input and text output, suitable for multimodal scenarios.
Image-to-Text
Transformers English

G
RedHatAI
505
1
Qwq 32B INT8 W8A8
Apache-2.0
INT8 quantized version of QWQ-32B, optimized by reducing the bit-width of weights and activations
Large Language Model
Transformers English

Q
ospatch
590
4
Whisper Large V3.w4a16
Apache-2.0
This is the quantized version of openai/whisper-large-v3, employing INT4 weight quantization and FP16 activation quantization, suitable for vLLM inference.
Speech Recognition
Transformers English

W
nm-testing
20
1
Qwen2.5 VL 3B Instruct Quantized.w8a8
Apache-2.0
Quantized version of Qwen/Qwen2.5-VL-3B-Instruct, supporting visual-text input and text output, with weights quantized to INT8 and activations quantized to INT8.
Image-to-Text
Transformers English

Q
RedHatAI
274
1
Pixtral 12b FP8 Dynamic
Apache-2.0
pixtral-12b-FP8-dynamic is a quantized version of mistral-community/pixtral-12b. By quantizing weights and activations to the FP8 data type, it reduces disk size and GPU memory requirements by approximately 50%. It is suitable for commercial and research purposes in multiple languages.
Text-to-Image
Safetensors Supports Multiple Languages
P
RedHatAI
87.31k
9
Deepseek Coder V2 Lite Instruct FP8
Other
FP8 quantized version of DeepSeek-Coder-V2-Lite-Instruct, suitable for commercial and research use in English, optimized for inference efficiency.
Large Language Model
Transformers

D
RedHatAI
11.29k
7
Meta Llama 3 70B Instruct Quantized.w8a16
A quantized version of Meta-Llama-3-70B-Instruct, mainly used for English business and research purposes, capable of efficiently conducting assistant-like chats.
Large Language Model
Transformers English

M
RedHatAI
1,035
5
Featured Recommended AI Models