Q

Qwen3 30B A3B GGUF

Developed by ubergarm
A quantized version of Qwen3-30B-A3B, utilizing advanced nonlinear SotA quantization technology to deliver best-in-class quality within given memory constraints.
Downloads 780
Release Time : 5/2/2025

Model Overview

This is a quantized version based on the Qwen/Qwen3-30B-A3B model, designed for efficient inference, supporting conversational interactions, and suitable for text generation tasks.

Model Features

Advanced Nonlinear Quantization
Utilizes the ik_llama.cpp branch to support advanced nonlinear SotA quantization, enabling high-quality inference.
Efficient Memory Usage
Capable of running over 32k context on a 24GB VRAM GPU, optimizing memory consumption.
High-Performance Inference
Achieves over 1600 tok/sec PP and 105 tok/sec TG on a 3090TI FE with 24GB VRAM.

Model Capabilities

Text Generation
Conversational Interaction
Long Context Processing

Use Cases

Text Generation
Dialogue Systems
Used to build efficient dialogue systems supporting long-context interactions.
Maintains high-quality generation under 32k context
Content Creation
Assists in generating high-quality text content such as articles, stories, etc.
Featured Recommended AI Models
ยฉ 2025AIbase