Model Selection

4-bit quantized inference

# 4-bit quantized inference

GLM 4 32B 0414 4bit DWQ

This is the MLX format version of the THUDM/GLM-4-32B-0414 model, processed with 4-bit DWQ quantization, suitable for efficient inference on Apple silicon devices.

Large Language Model Supports Multiple Languages

Josiefied Qwen3 4B Abliterated V1 4bit

This is a 4-bit quantized version of the Qwen3-4B model converted to MLX format, suitable for text generation tasks.

Large Language Model

GLM 4 32B 0414 4bit

GLM-4-32B-0414-4bit is an MLX format model converted from THUDM/GLM-4-32B-0414, supporting Chinese and English text generation tasks.

Large Language Model Supports Multiple Languages

Gemma 3 12b It Qat 4bit

MLX format model converted from google/gemma-3-12b-it-qat-q4_0-unquantized, supporting image-text generation tasks

Transformers Other

Gemma 3 4b It Qat 4bit

Gemma 3 4B IT QAT 4bit is a 4-bit quantized large language model trained with Quantization-Aware Training (QAT), based on the Gemma 3 architecture and optimized for the MLX framework.

Transformers Other

Qwen2 Vl Instuct Bpmncoder

4-bit quantized version based on Qwen2-VL-7B model, trained using Unsloth and Huggingface TRL library, achieving 2x inference speedup

Transformers English

Gemma 3 12b It Mlx 4Bit

Gemma 3 12B IT MLX 4Bit is an MLX format model converted from unsloth/gemma-3-12b-it, designed for Apple silicon devices.

Large Language Model

Transformers English

Optimized Qwen2 model based on Unsloth and Huggingface TRL library, achieving 2x inference speed improvement

Large Language Model

Transformers English

Qvikhr 2.5 1.5B Instruct SMPO MLX 4bit

This is a 4-bit quantized version of the QVikhr-2.5-1.5B-Instruct-SMPO model, optimized for the MLX framework, supporting Russian and English instruction understanding and generation tasks.

Large Language Model

Transformers Supports Multiple Languages

Mlx Stable Diffusion 3.5 Large 4bit Quantized

This is a quantized version of the Stable Diffusion 3.5 Large model on the DiffusionKit MLX framework, suitable for image generation tasks.

Text-to-Image English

Meta Llama 3.1 8B Text To SQL

A 4-bit quantized fine-tuned model based on Meta-Llama-3.1-8B, specialized in text generation tasks, particularly text-to-SQL conversion

Large Language Model

Transformers Supports Multiple Languages

Mistral 7B Instruct V0.3 AWQ

Mistral-7B-Instruct-v0.3 is a large language model fine-tuned on Mistral-7B-v0.3 with instructions, optimized for inference efficiency using 4-bit AWQ quantization technology

Large Language Model

Google Gemma 2b AWQ 4bit Smashed

A 4-bit quantized version of the google/gemma-2b model compressed using AWQ technology, designed to enhance inference efficiency and reduce resource consumption.

Large Language Model

Phi 3 Mini 4k Instruct Q4

Phi-3 4k Instruct is a lightweight yet powerful language model, processed with 4-bit quantization to reduce resource requirements.

Large Language Model

Deepseek Llm 7B Base AWQ

Deepseek LLM 7B Base is a 7B-parameter foundational large language model optimized for inference efficiency using AWQ quantization technology.

Large Language Model

Llama 2 7b Mt Czech To English

This is a fine-tuned adapter for the Meta Llama 2 7B model, specifically designed for translating Czech text into English.

Machine Translation Supports Multiple Languages

Mistral 7b Guanaco

A pre-trained language model based on the Llama2 architecture, suitable for English text generation tasks

Large Language Model

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase