# Efficient Inference
Qwen Qwen2.5 Coder 1.5B GGUF
The GGUF quantized version of Qwen2.5-Coder-1.5B, optimized for code generation tasks, offering multiple quantization options to balance performance and resource consumption.
Large Language Model
Q
featherless-ai-quants
228
1
Neobert GGUF
MIT
This is a static quantized version of the chandar-lab/NeoBERT model, aiming to reduce model storage space and computational resource requirements.
Large Language Model
Transformers English

N
mradermacher
219
1
Josiefied Qwen3 30B A3B Abliterated V2 4bit
This is a 4-bit quantized version converted from the Qwen3-30B model, suitable for text generation tasks on the MLX framework.
Large Language Model
J
mlx-community
194
1
Apriel Nemotron 15b Thinker GGUF
MIT
Apriel-Nemotron-15b-Thinker is a powerful inference model that performs excellently among models of the same scale. It has efficient memory usage and excellent inference capabilities, making it suitable for various enterprise and academic scenarios.
Large Language Model
Transformers

A
Mungert
1,097
1
Wan2.1 14B T2V FusionX GGUF
Apache-2.0
This is a quantized model for text-to-video conversion, which supports converting text descriptions into video content and has been processed by GGUF quantization to improve inference efficiency.
Text-to-Video English
W
QuantStack
133
1
Deepseek R1 0528 Qwen3 8B AWQ 4bit
MIT
The AWQ quantized version of DeepSeek-R1-0528-Qwen3-8B, suitable for efficient inference in specific scenarios.
Large Language Model
Transformers

D
hxac
179
2
Dmindai.dmind 1 GGUF
DMind-1 is a text generation foundation model dedicated to the free dissemination of knowledge.
Large Language Model
D
DevQuasar
226
1
Devstral Small 2505 GGUF
Apache-2.0
Quantized version of Devstral-Small-2505, offering multiple precision options to adapt to different hardware requirements
Large Language Model Supports Multiple Languages
D
Antigma
170
1
Google.medgemma 27b Text It GGUF
MedGemma-27B-Text-IT is a large language model developed by Google, focusing on text generation tasks in the medical field.
Large Language Model
G
DevQuasar
593
1
Vintern 1B V3 5 GGUF Ext
MIT
Vintern-1B-v3_5 is a 1-billion-parameter vision-language model supporting image-text generation tasks.
Text-to-Image
V
rootonchair
242
1
Sam Reason S2.1 GGUF
MIT
Static quantized version of Sam-reason-S2.1, offering multiple quantization options to suit different hardware requirements
Large Language Model English
S
mradermacher
299
1
Tngtech.deepseek R1T Chimera GGUF
DeepSeek-R1T-Chimera is a text generation model developed based on tngtech's technology, focusing on efficient natural language processing tasks.
Large Language Model
T
DevQuasar
1,407
2
Ling Lite 1.5
MIT
Ling is a large-scale Mixture of Experts (MoE) language model open-sourced by InclusionAI. The Lite version features 16.8 billion total parameters with 2.75 billion activated parameters, demonstrating exceptional performance.
Large Language Model
Transformers

L
inclusionAI
46
3
Ko Gemma 3 12b
This is a transformers model published on the Hugging Face Hub. The specific functions and uses are to be supplemented.
Large Language Model
Transformers

K
davidkim205
126
1
Apriel Nemotron 15b Thinker
MIT
A 15-billion-parameter efficient inference model launched by ServiceNow, with memory usage only half that of comparable advanced models
Large Language Model
Transformers

A
ServiceNow-AI
1,252
86
Qwen3 14B FP8 Dynamic
Apache-2.0
Qwen3-14B-FP8-dynamic is an optimized large language model. By quantizing activation values and weights to the FP8 data type, it effectively reduces GPU memory requirements and improves computational throughput.
Large Language Model
Transformers

Q
RedHatAI
167
1
Falcon H1 3B Base
Other
Falcon H1 is a hybrid architecture language model developed by the UAE's Technology Innovation Institute, combining Transformer and Mamba architectures to support multilingual processing
Large Language Model
Transformers Supports Multiple Languages

F
tiiuae
334
3
Qwen3 4B GGUF
Apache-2.0
Qwen3-4B is a GGUF format model based on Qwen3-4B-Base, suitable for text generation tasks.
Large Language Model
Q
Mungert
1,507
7
Mimo 7B RL
MIT
MiMo-7B-RL is a reinforcement learning model trained based on the MiMo-7B-SFT model, demonstrating outstanding performance in mathematical and code reasoning tasks, comparable to OpenAI o1-mini.
Large Language Model
Transformers

M
XiaomiMiMo
11.79k
252
Qwen3 32B MLX 4bit
Apache-2.0
This model is a 4-bit quantized version of Qwen3-32B in MLX format, optimized for efficient operation on Apple Silicon devices.
Large Language Model
Q
lmstudio-community
32.14k
3
Qwen Qwen3 4B GGUF
The Llamacpp imatrix quantization version of Qwen3-4B provided by the Qwen team, supporting multiple quantization types and suitable for text generation tasks.
Large Language Model
Q
bartowski
10.58k
9
Meta Llama 3.1 8B Instruct Quantized.w8a8
This is the INT8 quantized version of the Meta-Llama-3.1-8B-Instruct model, optimized through weight and activation quantization, suitable for multilingual business and research applications.
Large Language Model
Transformers Supports Multiple Languages

M
RedHatAI
9,087
16
Alibaba Pai.distilqwen2.5 DS3 0324 32B GGUF
A lightweight version of the Qwen2.5 large language model released by Alibaba PAI, focusing on efficient text generation tasks
Large Language Model
A
DevQuasar
1,117
4
Deepthink 1.5B Open PRM Q8 0 GGUF
Apache-2.0
Deepthink-1.5B-Open-PRM is a 1.5B parameter open-source language model, converted to GGUF format for use with llama.cpp.
Large Language Model English
D
prithivMLmods
46
2
Mistral Community Pixtral 12b GGUF
Apache-2.0
This is the quantized version of the pixtral-12b model, quantized using llama.cpp, supporting image-text-to-text tasks.
M
bartowski
1,728
4
Bge Multilingual Gemma2 GPTQ
Apache-2.0
This is the 4-bit GPTQ quantized version of the BAAI/bge-multilingual-gemma2 model, supporting multilingual text embedding tasks.
Text Embedding
Transformers

B
shuyuej
34
5
Smolvlm2 2.2B Instruct GGUF
Apache-2.0
SmolVLM2-2.2B-Instruct is a 2.2B parameter vision-language model focused on video-text-to-text tasks, supporting English.
English
S
mradermacher
235
0
Gemma 3 27b It Qat GGUF
Gemma 3 is a lightweight open model series built by Google based on Gemini technology, supporting multimodal input and text output, featuring a 128K large context window and support for 140+ languages.
Text-to-Image English
G
unsloth
2,683
3
GLM 4 32B 0414 EXL3
Apache-2.0
GLM-4-32B-0414 is a large-scale language model developed by the THUDM team, based on the GLM architecture, suitable for various text generation tasks.
Large Language Model
G
owentruong
36
2
Hidream I1 Full Gguf
MIT
HiDream-I1-Full is a GGUF-format text-to-image generation model designed for image generation tasks.
Image Generation English
H
city96
43.94k
38
Hidream I1 Dev Gguf
MIT
HiDream-I1-Dev is an image generation model based on GGUF format conversion, supporting text-to-image generation tasks.
Image Generation English
H
city96
41.18k
45
Moderncamembert Cv2 Base
MIT
A French language model pre-trained on 1 trillion high-quality French texts, the French version of ModernBERT
Large Language Model
Transformers French

M
almanach
232
2
Gemma 3 4b It GPTQ 4b 128g
INT4 quantized version based on the gemma-3-4b-it model, significantly reducing storage and computational resource requirements
Image-to-Text
Transformers

G
ISTA-DASLab
502
2
Doge 20M Chinese
Apache-2.0
The Doge model employs dynamic masked attention mechanisms for sequence transformation, with the option to use either multi-layer perceptrons or cross-domain mixture of experts for state transitions.
Large Language Model
Transformers Supports Multiple Languages

D
wubingheng
65
2
Slim Orpheus 3b JAPANESE Ft Q8 0 GGUF
Apache-2.0
This is a GGUF format model converted from the slim-orpheus-3b-JAPANESE-ft model, specifically optimized for Japanese text processing.
Large Language Model Japanese
S
Gapeleon
26
0
Deepcogito Cogito V1 Preview Llama 70B 6bit
This is a large language model with 70B parameters based on the Llama architecture, which has undergone 6-bit quantization and is suitable for text generation tasks.
Large Language Model
D
mlx-community
8,168
1
Quasar 3.0 Instract V2
Quasar-3.0-7B is the distilled version of the upcoming 400B Quasar 3.0 model, showcasing the early strength and potential of the Quasar architecture.
Large Language Model
Transformers

Q
silx-ai
314
8
Quasar 3.0 Final
Quasar-3.0-Max is a 7B parameter distilled model provided by SILX INC, showcasing the early potential of the Quasar architecture with innovative TTM training process and reinforcement learning techniques.
Large Language Model
Transformers

Q
silx-ai
118
4
TD HallOumi 3B
A claim verification model fine-tuned from Llama-3.2-3B-Instruct, specifically designed to detect hallucinations or unsupported statements in AI-generated text.
Text Classification English
T
TEEN-D
46
2
Huihui Ai.deepseek V3 0324 Pruned Coder 411B GGUF
DeepSeek-V3-0324-Pruned-Coder-411B is a pruned and optimized code generation model based on the DeepSeek-V3 architecture, focusing on code generation tasks.
Large Language Model
H
DevQuasar
2,706
2
- 1
- 2
- 3
- 4
- 5
- 6
Featured Recommended AI Models