# Large language model
Hunyuan A13B Instruct 4bit
Other
The 4-bit quantization version of Tencent Hunyuan A13B large language model, suitable for instruction following tasks
Large Language Model
H
mlx-community
201
4
Kimi Dev 72B GGUF
MIT
A quantized version of Kimi-Dev-72B, using advanced nonlinear optimal quantization and multi-head latent attention mechanism to reduce storage and computing requirements.
Large Language Model Other
K
ubergarm
2,780
1
Delta Vector Austral 24B Winton GGUF
Apache-2.0
A quantized version of the Austral-24B-Winton model of Delta-Vector, quantized using the llama.cpp tool, suitable for efficient operation on different hardware configurations.
Large Language Model English
D
bartowski
421
1
Deepseek R1 0528 Qwen3 8B 6bit
MIT
A 6-bit quantized version converted from the DeepSeek-R1-0528-Qwen3-8B model, suitable for text generation tasks in the MLX framework.
Large Language Model
D
mlx-community
582
1
Qwen3 235B A22B 4bit DWQ 053125
Apache-2.0
This is a 4-bit quantized version converted from the Qwen3-235B-A22B-8bit model, optimized for the MLX framework and suitable for text generation tasks.
Large Language Model
Q
mlx-community
200
1
Deepseek Ai DeepSeek R1 0528 GGUF
MIT
DeepSeek-R1-0528 is a large language model that has been quantized to optimize its running efficiency on different hardware.
Large Language Model
D
bartowski
2,703
6
GLM 4 32B 0414 4bit DWQ
MIT
This is the MLX format version of the THUDM/GLM-4-32B-0414 model, processed with 4-bit DWQ quantization, suitable for efficient inference on Apple silicon devices.
Large Language Model Supports Multiple Languages
G
mlx-community
156
4
Gemma 3 12b It 4bit DWQ
A 4-bit quantized version of the Gemma 3 12B model, suitable for the MLX framework and supporting efficient text generation tasks.
Large Language Model
G
mlx-community
554
2
Qwen3 30B A3B 4bit DWQ 05082025
Apache-2.0
This is a 4-bit quantized model converted from Qwen/Qwen3-30B-A3B to MLX format, suitable for text generation tasks.
Large Language Model
Q
mlx-community
240
5
Qwen3 30B A3B 4bit DWQ 0508
Apache-2.0
Qwen3-30B-A3B-4bit-DWQ-0508 is a 4-bit quantized model converted from Qwen/Qwen3-30B-A3B to MLX format, suitable for text generation tasks.
Large Language Model
Q
mlx-community
410
12
Nvidia.opencodereasoning Nemotron 14B GGUF
An open-source code reasoning large language model developed by NVIDIA, with a parameter scale of 14 billion, focused on code generation and reasoning tasks.
Large Language Model
N
DevQuasar
423
2
Qwen3 14B 4bit AWQ
Apache-2.0
Qwen3-14B-4bit-AWQ is an MLX-format model converted from Qwen/Qwen3-14B, using AWQ quantization technology to compress the model to 4bit, suitable for efficient inference on the MLX framework.
Large Language Model
Q
mlx-community
252
2
Qwen3 8b Ru
Apache-2.0
Russian-optimized large language model based on Qwen3-8B, specifically designed for Russian text generation tasks
Large Language Model
Transformers Other

Q
attn-signs
30
2
Qwen3 8B 4bit AWQ
Apache-2.0
Qwen3-8B-4bit-AWQ is a 4-bit AWQ quantized version converted from Qwen/Qwen3-8B, suitable for text generation tasks in the MLX framework.
Large Language Model
Q
mlx-community
1,682
1
Qwen3 30B A3B GGUF
The GGUF quantized version of Qwen3-30B-A3B, supporting multi-bit quantization, suitable for text generation tasks.
Large Language Model
Q
MaziyarPanahi
158.92k
3
Qwen3 14B MLX 4bit
Apache-2.0
Qwen3-14B-4bit is a 4-bit quantized version of the Qwen/Qwen3-14B model converted using mlx-lm, suitable for text generation tasks.
Large Language Model
Q
lmstudio-community
3,178
4
GLM Z1 32B 0414 4bit
MIT
This model is a 4-bit quantized version converted from THUDM/GLM-Z1-32B-0414, suitable for text generation tasks.
Large Language Model Supports Multiple Languages
G
mlx-community
225
2
Qwq DeepSeek R1 SkyT1 Flash Lightest 32B
This is a merged model based on Qwen2.5-32B, incorporating features from DeepSeek-R1-Distill-Qwen-32B, QwQ-32B, and Sky-T1-32B-Flash to enhance performance.
Large Language Model
Transformers

Q
sm54
14
4
Deepseek R1 Quantized.w4a16
MIT
INT4 weight-quantized version of DeepSeek-R1, reducing GPU memory and disk space requirements by approximately 50% while maintaining original model performance.
Large Language Model
D
RedHatAI
119
4
Gemma 3 27b It Qat Bf16
Gemma 3 27B IT QAT BF16 is a version of the Gemma series of models released by Google. It has undergone quantization-aware training (QAT) and is converted to the BF16 format, suitable for the MLX framework.
Image-to-Text
Transformers

G
mlx-community
178
2
Gemma 3 27b It Qat 8bit
Other
Gemma 3 27B IT QAT 8bit is an MLX-format model converted from Google's Gemma 3 27B model, supporting image-to-text tasks.
Image-to-Text
Transformers Other

G
mlx-community
422
2
Bitnet B1.58 2B 4T Gguf
MIT
The first open-source, native 1-bit large language model developed by Microsoft Research, with a parameter scale of 2 billion, trained on a corpus of 4 trillion tokens.
Large Language Model English
B
microsoft
25.77k
143
Bitnet B1.58 2B 4T Bf16
MIT
An open-source native 1-bit large language model developed by Microsoft Research, with 2 billion parameters trained on a 4 trillion token corpus, significantly improving computational efficiency.
Large Language Model
Transformers English

B
microsoft
2,968
24
Plamo Embedding 1b
Apache-2.0
PLaMo-Embedding-1B is a Japanese text embedding model developed by Preferred Networks, demonstrating outstanding performance in Japanese text embedding benchmarks
Text Embedding
Transformers Japanese

P
pfnet
33.48k
25
Cantonesellmchat V1.0 32B
Other
Cantonese LLM Chat v1.0 is the first-generation Cantonese large language model developed by the hon9kon9ize team, excelling in Hong Kong-related professional knowledge and Cantonese conversation.
Large Language Model
Transformers

C
hon9kon9ize
117
5
Qwq 32B FP8 Dynamic
MIT
FP8 quantized version of QwQ-32B, reducing storage and memory requirements by 50% through dynamic quantization while maintaining 99.75% of the original model accuracy
Large Language Model
Transformers

Q
nm-testing
3,895
3
Pllum 8x7B Chat GGUF
Apache-2.0
The GGUF quantization version of PLLuM-8x7B-chat, optimized for local inference, supporting multiple quantization levels to meet different hardware requirements.
Large Language Model
Transformers

P
piotrmaciejbednarski
126
2
Qwenvergence 14B V13 Prose DS
Apache-2.0
Qwenvergence-14B-v13-Prose-DS is a large language model based on the merger of multiple models, with excellent data performance and humorous expression ability.
Large Language Model
Transformers English

Q
sometimesanotion
133
9
Jais Family 1p3b
Apache-2.0
The Jais series is a bilingual large language model specialized in Arabic language processing, with strong English capabilities and 1.3 billion parameters.
Large Language Model
Safetensors Supports Multiple Languages
J
inceptionai
318
9
Jais Family 1p3b Chat
Apache-2.0
Jais series 1.3 billion parameter Arabic-English bilingual large language model, optimized for exceptional Arabic capabilities while maintaining strong English proficiency
Large Language Model Supports Multiple Languages
J
inceptionai
479
6
Gemma 2 27b It Q8 0 GGUF
This is a GGUF format model converted from Google's Gemma 2B model, suitable for text generation tasks.
Large Language Model
G
KimChen
471
2
Qwen2 7B Int4 Inc
Apache-2.0
INT4 auto-quantized model based on Qwen2-7B, generated by Intel's auto-round tool, suitable for efficient inference tasks
Large Language Model
Transformers

Q
Intel
48
6
Norskgpt Llama 3 70b Adapter
Norwegian language adapter developed based on Llama-3-70b-fp16, trained on 1 million Norwegian text tokens
Large Language Model
Transformers Other

N
bineric
37
6
Ppo Tldr
A fine-tuned version based on the EleutherAI_pythia-1b-deduped model for generating concise summaries
Large Language Model
Transformers

P
vwxyzjn
15
1
Llama 3 Open Ko 8B
Other
A Korean language model continuously pre-trained based on Llama-3-8B, using over 60GB of deduplicated publicly available text for training, supporting Korean and English text generation.
Large Language Model
Transformers Supports Multiple Languages

L
beomi
6,729
146
Adebert
Apache-2.0
adeBERT is a fine-tuned model based on the BERT-large architecture, focusing on domain-specific tasks and demonstrating excellent performance on evaluation datasets.
Large Language Model
Transformers

A
Jacobberk
25
1
Llama 3 8B Instruct GPTQ 4 Bit
Other
This is a 4-bit quantized GPTQ model based on Meta Llama 3, quantized by Astronomer, capable of efficient operation on low-VRAM devices.
Large Language Model
Transformers

L
astronomer
2,059
25
Sambalingo Arabic Chat 70B
SambaLingo-Arabic-Chat-70B is a human-aligned conversational model supporting Arabic and English, adapted and trained based on Llama-2-70b.
Large Language Model
Transformers Supports Multiple Languages

S
sambanovasystems
47
3
FRED T5 Summarizer
MIT
A Russian text summarization model developed by SberDevices, based on the T5 architecture with 1.7B parameters
Text Generation
Transformers Other

F
RussianNLP
11.76k
21
Midnight Miqu 103B V1.0
Other
A 103B parameter hybrid model based on the leaked Miqu model, supporting 32K context length
Large Language Model
Transformers

M
sophosympatheia
18
13
- 1
- 2
Featured Recommended AI Models