Q

Qwen3 30B A1.5B 64K High Speed NEO Imatrix MAX Gguf

Developed by DavidAU
An optimized version based on the Qwen3-30B-A3B Mixture of Experts model, improving speed by reducing the number of active experts, supporting 64k context length, and suitable for various text generation tasks.
Downloads 508
Release Time : 5/11/2025

Model Overview

This model is a fine-tuned version of Qwen3-30B-A3B, reducing the number of active experts from 8 to 4 (out of 128 total experts), significantly enhancing inference speed. Suitable for creative writing, problem-solving, deep reasoning, and various other text generation scenarios.

Model Features

High-speed inference
By reducing the number of active experts, inference speed is nearly doubled, achieving 23 tokens/sec on CPU and 125+ tokens/sec on GPU.
64k context
Context length extended to 64k (65535) via YARN technology, ideal for long-text tasks.
Multi-quantization version support
Offers multiple quantization versions (e.g., IQ1_M MAX, IQ2, Q4KS, etc.), suitable for different hardware environments (GPU/CPU).
Mixture of Experts architecture
The model employs a Mixture of Experts structure, automatically selecting the best experts to process input content, balancing performance and quality.

Model Capabilities

Text generation
Creative writing
Problem-solving
Deep reasoning
Role-playing
Story creation
Tool calling

Use Cases

Creative writing
Sci-fi story creation
Generate sci-fi story segments with specific themes and emotions.
Generated content is emotionally rich and thematically focused, as seen in the example segment 'The Last Transmission'.
Problem-solving
Deep reasoning tasks
Solve problems through systematic reasoning with long-chain thinking.
The model can generate detailed thought processes (wrapped in <think> tags) and provide final solutions.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase