Model Selection

Efficient Inference

# Efficient Inference

Qwen Qwen2.5 Coder 1.5B GGUF

The GGUF quantized version of Qwen2.5-Coder-1.5B, optimized for code generation tasks, offering multiple quantization options to balance performance and resource consumption.

Large Language Model

featherless-ai-quants

This is a static quantized version of the chandar-lab/NeoBERT model, aiming to reduce model storage space and computational resource requirements.

Large Language Model

Transformers English

Josiefied Qwen3 30B A3B Abliterated V2 4bit

This is a 4-bit quantized version converted from the Qwen3-30B model, suitable for text generation tasks on the MLX framework.

Large Language Model

Apriel Nemotron 15b Thinker GGUF

Apriel-Nemotron-15b-Thinker is a powerful inference model that performs excellently among models of the same scale. It has efficient memory usage and excellent inference capabilities, making it suitable for various enterprise and academic scenarios.

Large Language Model

Wan2.1 14B T2V FusionX GGUF

This is a quantized model for text-to-video conversion, which supports converting text descriptions into video content and has been processed by GGUF quantization to improve inference efficiency.

Text-to-Video English

Deepseek R1 0528 Qwen3 8B AWQ 4bit

The AWQ quantized version of DeepSeek-R1-0528-Qwen3-8B, suitable for efficient inference in specific scenarios.

Large Language Model

Dmindai.dmind 1 GGUF

DMind-1 is a text generation foundation model dedicated to the free dissemination of knowledge.

Large Language Model

Devstral Small 2505 GGUF

Quantized version of Devstral-Small-2505, offering multiple precision options to adapt to different hardware requirements

Large Language Model Supports Multiple Languages

Google.medgemma 27b Text It GGUF

MedGemma-27B-Text-IT is a large language model developed by Google, focusing on text generation tasks in the medical field.

Large Language Model

Vintern 1B V3 5 GGUF Ext

Vintern-1B-v3_5 is a 1-billion-parameter vision-language model supporting image-text generation tasks.

Sam Reason S2.1 GGUF

Static quantized version of Sam-reason-S2.1, offering multiple quantization options to suit different hardware requirements

Large Language Model English

Tngtech.deepseek R1T Chimera GGUF

DeepSeek-R1T-Chimera is a text generation model developed based on tngtech's technology, focusing on efficient natural language processing tasks.

Large Language Model

Ling is a large-scale Mixture of Experts (MoE) language model open-sourced by InclusionAI. The Lite version features 16.8 billion total parameters with 2.75 billion activated parameters, demonstrating exceptional performance.

Large Language Model

This is a transformers model published on the Hugging Face Hub. The specific functions and uses are to be supplemented.

Large Language Model

Apriel Nemotron 15b Thinker

A 15-billion-parameter efficient inference model launched by ServiceNow, with memory usage only half that of comparable advanced models

Large Language Model

Qwen3 14B FP8 Dynamic

Qwen3-14B-FP8-dynamic is an optimized large language model. By quantizing activation values and weights to the FP8 data type, it effectively reduces GPU memory requirements and improves computational throughput.

Large Language Model

Falcon H1 3B Base

Falcon H1 is a hybrid architecture language model developed by the UAE's Technology Innovation Institute, combining Transformer and Mamba architectures to support multilingual processing

Large Language Model

Transformers Supports Multiple Languages

Qwen3-4B is a GGUF format model based on Qwen3-4B-Base, suitable for text generation tasks.

Large Language Model

MiMo-7B-RL is a reinforcement learning model trained based on the MiMo-7B-SFT model, demonstrating outstanding performance in mathematical and code reasoning tasks, comparable to OpenAI o1-mini.

Large Language Model

Qwen3 32B MLX 4bit

This model is a 4-bit quantized version of Qwen3-32B in MLX format, optimized for efficient operation on Apple Silicon devices.

Large Language Model

lmstudio-community

Qwen Qwen3 4B GGUF

The Llamacpp imatrix quantization version of Qwen3-4B provided by the Qwen team, supporting multiple quantization types and suitable for text generation tasks.

Large Language Model

Meta Llama 3.1 8B Instruct Quantized.w8a8

This is the INT8 quantized version of the Meta-Llama-3.1-8B-Instruct model, optimized through weight and activation quantization, suitable for multilingual business and research applications.

Large Language Model

Transformers Supports Multiple Languages

Alibaba Pai.distilqwen2.5 DS3 0324 32B GGUF

A lightweight version of the Qwen2.5 large language model released by Alibaba PAI, focusing on efficient text generation tasks

Large Language Model

Deepthink 1.5B Open PRM Q8 0 GGUF

Deepthink-1.5B-Open-PRM is a 1.5B parameter open-source language model, converted to GGUF format for use with llama.cpp.

Large Language Model English

Mistral Community Pixtral 12b GGUF

This is the quantized version of the pixtral-12b model, quantized using llama.cpp, supporting image-text-to-text tasks.

Bge Multilingual Gemma2 GPTQ

This is the 4-bit GPTQ quantized version of the BAAI/bge-multilingual-gemma2 model, supporting multilingual text embedding tasks.

Smolvlm2 2.2B Instruct GGUF

SmolVLM2-2.2B-Instruct is a 2.2B parameter vision-language model focused on video-text-to-text tasks, supporting English.

Gemma 3 27b It Qat GGUF

Gemma 3 is a lightweight open model series built by Google based on Gemini technology, supporting multimodal input and text output, featuring a 128K large context window and support for 140+ languages.

Text-to-Image English

GLM 4 32B 0414 EXL3

GLM-4-32B-0414 is a large-scale language model developed by the THUDM team, based on the GLM architecture, suitable for various text generation tasks.

Large Language Model

Hidream I1 Full Gguf

HiDream-I1-Full is a GGUF-format text-to-image generation model designed for image generation tasks.

Image Generation English

Hidream I1 Dev Gguf

HiDream-I1-Dev is an image generation model based on GGUF format conversion, supporting text-to-image generation tasks.

Image Generation English

Moderncamembert Cv2 Base

A French language model pre-trained on 1 trillion high-quality French texts, the French version of ModernBERT

Large Language Model

Transformers French

Gemma 3 4b It GPTQ 4b 128g

INT4 quantized version based on the gemma-3-4b-it model, significantly reducing storage and computational resource requirements

Doge 20M Chinese

The Doge model employs dynamic masked attention mechanisms for sequence transformation, with the option to use either multi-layer perceptrons or cross-domain mixture of experts for state transitions.

Large Language Model

Transformers Supports Multiple Languages

Slim Orpheus 3b JAPANESE Ft Q8 0 GGUF

This is a GGUF format model converted from the slim-orpheus-3b-JAPANESE-ft model, specifically optimized for Japanese text processing.

Large Language Model Japanese

Deepcogito Cogito V1 Preview Llama 70B 6bit

This is a large language model with 70B parameters based on the Llama architecture, which has undergone 6-bit quantization and is suitable for text generation tasks.

Large Language Model

Quasar 3.0 Instract V2

Quasar-3.0-7B is the distilled version of the upcoming 400B Quasar 3.0 model, showcasing the early strength and potential of the Quasar architecture.

Large Language Model

Quasar 3.0 Final

Quasar-3.0-Max is a 7B parameter distilled model provided by SILX INC, showcasing the early potential of the Quasar architecture with innovative TTM training process and reinforcement learning techniques.

Large Language Model

A claim verification model fine-tuned from Llama-3.2-3B-Instruct, specifically designed to detect hallucinations or unsupported statements in AI-generated text.

Text Classification English

Huihui Ai.deepseek V3 0324 Pruned Coder 411B GGUF

DeepSeek-V3-0324-Pruned-Coder-411B is a pruned and optimized code generation model based on the DeepSeek-V3 architecture, focusing on code generation tasks.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase