G

GLM 4 9B 0414 4bit DWQ

Developed by Narutoouz
A high-performance 4-bit DWQ quantized version of GLM-4-9B, optimized for Apple chips and supporting 128K long context.
Downloads 194
Release Time : 6/1/2025

Model Overview

This project implements high-performance 4-bit DWQ quantization for THUDM/GLM-4-9B-0414, enabling efficient deployment on Apple devices and supporting long-context generation tasks.

Model Features

High-performance 4-bit quantization
Using DWQ quantization technology, significantly reducing memory requirements while maintaining 90 - 95% of the model quality.
Apple chip optimization
Deeply optimized for M-series chips, achieving an inference speed of 85.23 tok/s on M4 Max.
Long context support
Supports the processing of ultra-long contexts of 128K tokens (manual configuration required in LM Studio).
Memory-efficient
Only about 8GB of memory is required after quantization, reducing memory usage by 70% compared to the original model.

Model Capabilities

Long text generation
Multi-round dialogue
Knowledge Q&A
Text summarization

Use Cases

Content creation
Long article generation
Generate coherent long content using the 128K context capability.
Maintain context consistency, suitable for technical documentation or story creation.
Development assistance
Code generation and completion
Analyze the codebase based on long context and generate relevant code.
Achieve a generation speed of 85+ tok/s on M4 Max.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase