# 4-bit quantized inference

GLM 4 32B 0414 4bit DWQ
MIT
This is the MLX format version of the THUDM/GLM-4-32B-0414 model, processed with 4-bit DWQ quantization, suitable for efficient inference on Apple silicon devices.
Large Language Model Supports Multiple Languages
G
mlx-community
156
4
Josiefied Qwen3 4B Abliterated V1 4bit
This is a 4-bit quantized version of the Qwen3-4B model converted to MLX format, suitable for text generation tasks.
Large Language Model
J
mlx-community
175
1
GLM 4 32B 0414 4bit
MIT
GLM-4-32B-0414-4bit is an MLX format model converted from THUDM/GLM-4-32B-0414, supporting Chinese and English text generation tasks.
Large Language Model Supports Multiple Languages
G
mlx-community
361
3
Gemma 3 12b It Qat 4bit
Other
MLX format model converted from google/gemma-3-12b-it-qat-q4_0-unquantized, supporting image-text generation tasks
Text-to-Image Transformers Other
G
mlx-community
984
5
Gemma 3 4b It Qat 4bit
Other
Gemma 3 4B IT QAT 4bit is a 4-bit quantized large language model trained with Quantization-Aware Training (QAT), based on the Gemma 3 architecture and optimized for the MLX framework.
Image-to-Text Transformers Other
G
mlx-community
607
1
Qwen2 Vl Instuct Bpmncoder
Apache-2.0
4-bit quantized version based on Qwen2-VL-7B model, trained using Unsloth and Huggingface TRL library, achieving 2x inference speedup
Text-to-Image Transformers English
Q
utkarshkingh
18
1
Gemma 3 12b It Mlx 4Bit
Gemma 3 12B IT MLX 4Bit is an MLX format model converted from unsloth/gemma-3-12b-it, designed for Apple silicon devices.
Large Language Model Transformers English
G
przemekmroczek
23
1
Nano R1 Model
Apache-2.0
Optimized Qwen2 model based on Unsloth and Huggingface TRL library, achieving 2x inference speed improvement
Large Language Model Transformers English
N
Mansi-30
25
2
Qvikhr 2.5 1.5B Instruct SMPO MLX 4bit
Apache-2.0
This is a 4-bit quantized version of the QVikhr-2.5-1.5B-Instruct-SMPO model, optimized for the MLX framework, supporting Russian and English instruction understanding and generation tasks.
Large Language Model Transformers Supports Multiple Languages
Q
Vikhrmodels
249
2
Mlx Stable Diffusion 3.5 Large 4bit Quantized
Other
This is a quantized version of the Stable Diffusion 3.5 Large model on the DiffusionKit MLX framework, suitable for image generation tasks.
Text-to-Image English
M
argmaxinc
2,101
4
Meta Llama 3.1 8B Text To SQL
Apache-2.0
A 4-bit quantized fine-tuned model based on Meta-Llama-3.1-8B, specialized in text generation tasks, particularly text-to-SQL conversion
Large Language Model Transformers Supports Multiple Languages
M
ruslanmv
1,182
4
Mistral 7B Instruct V0.3 AWQ
Apache-2.0
Mistral-7B-Instruct-v0.3 is a large language model fine-tuned on Mistral-7B-v0.3 with instructions, optimized for inference efficiency using 4-bit AWQ quantization technology
Large Language Model Transformers
M
solidrust
48.24k
3
Google Gemma 2b AWQ 4bit Smashed
A 4-bit quantized version of the google/gemma-2b model compressed using AWQ technology, designed to enhance inference efficiency and reduce resource consumption.
Large Language Model Transformers
G
PrunaAI
33
1
Phi 3 Mini 4k Instruct Q4
Phi-3 4k Instruct is a lightweight yet powerful language model, processed with 4-bit quantization to reduce resource requirements.
Large Language Model Transformers
P
bongodongo
39
1
Deepseek Llm 7B Base AWQ
Other
Deepseek LLM 7B Base is a 7B-parameter foundational large language model optimized for inference efficiency using AWQ quantization technology.
Large Language Model Transformers
D
TheBloke
1,863
2
Llama 2 7b Mt Czech To English
MIT
This is a fine-tuned adapter for the Meta Llama 2 7B model, specifically designed for translating Czech text into English.
Machine Translation Supports Multiple Languages
L
kaitchup
59
4
Mistral 7b Guanaco
Apache-2.0
A pre-trained language model based on the Llama2 architecture, suitable for English text generation tasks
Large Language Model Transformers English
M
kingabzpro
67
3
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase