GLM 4 32B 0414.w4a16 Gptq
This is a model that uses the GPTQ method to perform 4-bit quantization on GLM-4-32B-0414, suitable for consumer-grade hardware.
Downloads 785
Release Time : 5/4/2025
Model Overview
This model quantizes GLM-4-32B-0414 to 4 bits (only the weights are 4 bits, W4A16) through the asymmetric GPTQ quantization technology, enabling it to run on consumer-grade hardware.
Model Features
4-bit quantization
Quantize the model to 4 bits using asymmetric GPTQ, significantly reducing video memory usage.
Consumer-grade hardware adaptation
The quantized model can run on a GPU with 32GB of video memory.
High-quality calibration
Calibrate using 2048 samples with a maximum sequence length of 4096 to minimize the risk of overfitting.
Model Capabilities
Text generation
Long sequence processing
Use Cases
Text generation
Long text generation
Supports long text generation with a maximum of 130,000 tokens.
Featured Recommended AI Models
Š 2025AIbase