Model Selection

Low Memory Optimization

# Low Memory Optimization

Gryphe Codex 24B Small 3.2 GGUF

This is a quantized version of Gryphe's Codex-24B-Small-3.2 model, which optimizes the running efficiency under different hardware conditions through quantization technology.

Large Language Model English

Xlangai Jedi 7B 1080p GGUF

This is a Jedi - 7B - 1080p model quantized using llama.cpp, offering multiple quantization types for users to choose from, balancing file size and model quality.

Large Language Model English

E N V Y Legion V2.1 LLaMa 70B Elarablated V0.8 Hf GGUF

Legion-V2.1-LLaMa-70B-Elarablated-v0.8-hf is a quantized version based on LLaMa-70B, optimized using llama.cpp, offering multiple quantization options to accommodate different hardware requirements.

Large Language Model

Nvidia Llama 3.1 Nemotron Nano 4B V1.1 GGUF

A quantized version of the NVIDIA Llama-3.1-Nemotron-Nano-4B-v1.1 model, processed with llama.cpp tools for various quantization methods, suitable for running in resource-constrained environments.

Large Language Model English

Qwen2.5 Omni 7B GPTQ Int4

Qwen2.5-Omni is an end-to-end multimodal model capable of perceiving various modalities such as text, images, audio, and video, and generating text and natural speech responses in a streaming manner.

Multimodal Fusion

Transformers English

Qwen Qwen3 1.7B GGUF

A quantized version based on Qwen/Qwen3-1.7B, using llama.cpp for quantization, supporting multiple quantization types, suitable for text generation tasks.

Large Language Model

Qwen Qwen3 4B GGUF

The Llamacpp imatrix quantization version of Qwen3-4B provided by the Qwen team, supporting multiple quantization types and suitable for text generation tasks.

Large Language Model

Llama 3.2 1B Instruct GGUF

Llama-3.2-1B-Instruct is a 1B-parameter instruction-fine-tuned model based on the Llama architecture, offering multiple quantization formats to accommodate different hardware requirements.

Large Language Model Supports Multiple Languages

Qwen2.5 72B Instruct GGUF

The GGUF quantized version of Qwen2.5-72B-Instruct, supporting multiple precision formats for efficient inference across different hardware environments.

Large Language Model English

Mxbai Rerank Large V2 GGUF

mxbai-rerank-large-v2 is a multilingual text reranking model that supports multiple languages and various quantization formats, suitable for different hardware environments.

Text Embedding Supports Multiple Languages

Meta Llama 3 8B GGUF

Meta-Llama-3-8B is an 8B-parameter large language model based on the GGUF format, supporting multiple quantized versions for various hardware environments.

Large Language Model English

Rwkv7 2.9B World GGUF

RWKV-7 architecture with 2.9 billion parameters, supporting multilingual text generation tasks

Large Language Model Supports Multiple Languages

This is a GGUF quantized version based on Wan-AI/Wan2.1-T2V-1.3B, specifically designed for text-to-video generation tasks, compatible with comfyui-gguf and gguf nodes.

Text-to-Video English

The GGUF quantized version of Mochi is a text-to-video generation model that includes a GGUF encoder and GGUF variational autoencoder, suitable for fast video content generation.

Text-to-Video English

Mochi is a text-to-video generation model based on the GGUF quantized version, supporting video content generation from text descriptions.

Text-to-Video English

Mixtral 8x7B V0.1

Mixtral-8x7B is a pre-trained generative sparse mixture of experts model that outperforms Llama 2 70B on most benchmarks.

Large Language Model

Transformers Supports Multiple Languages

StarCoder2-3B is a 3-billion-parameter code generation model trained on 17 programming languages, supporting a context window of 16,384 tokens.

Large Language Model

Transformers Other

Blip2 Flan T5 Xl Sharded

This is a sharded version of the BLIP-2 model implemented with Flan T5-xl for image-to-text tasks such as image captioning and visual question answering. Sharding allows it to be loaded in low-memory environments.

Transformers English

Nystromformer 4096

Long-sequence Nyströmformer model trained on WikiText-103 v1 dataset, supports sequence processing up to 4096 tokens

Large Language Model

Nystromformer 2048

Nystromformer model trained on the WikiText-103 dataset, supporting long sequence processing (2048 tokens)

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase