Q

Qwq 32B INT8 W8A8

Developed by ospatch
INT8 quantized version of QWQ-32B, optimized by reducing the bit-width of weights and activations
Downloads 590
Release Time : 3/13/2025

Model Overview

INT8 quantized version of QWQ-32B, optimized for GPU memory requirements and computational throughput, suitable for text generation tasks

Model Features

INT8 Quantization
Both weights and activations use INT8 quantization, reducing GPU memory requirements and disk space
Efficient Computation
Quantization improves matrix multiplication throughput by approximately 2x
vLLM Compatibility
Supports deployment via vLLM Docker image, providing OpenAI-compatible API

Model Capabilities

Text Generation

Use Cases

Natural Language Processing
Text Generation
Used for generating coherent text content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase