G

GLM 4 32B 0414.w4a16 Gptq

Developed by mratsim
This is a model that uses the GPTQ method to perform 4-bit quantization on GLM-4-32B-0414, suitable for consumer-grade hardware.
Downloads 785
Release Time : 5/4/2025

Model Overview

This model quantizes GLM-4-32B-0414 to 4 bits (only the weights are 4 bits, W4A16) through the asymmetric GPTQ quantization technology, enabling it to run on consumer-grade hardware.

Model Features

4-bit quantization
Quantize the model to 4 bits using asymmetric GPTQ, significantly reducing video memory usage.
Consumer-grade hardware adaptation
The quantized model can run on a GPU with 32GB of video memory.
High-quality calibration
Calibrate using 2048 samples with a maximum sequence length of 4096 to minimize the risk of overfitting.

Model Capabilities

Text generation
Long sequence processing

Use Cases

Text generation
Long text generation
Supports long text generation with a maximum of 130,000 tokens.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase