Q

Qwen3 30B A1.5B High Speed

Developed by DavidAU
An optimized high-speed version of Qwen3-30B, achieving doubled inference speed by reducing activated experts, suitable for text generation scenarios requiring rapid responses
Downloads 179
Release Time : 5/3/2025

Model Overview

Fine-tuned from Qwen3-30B-A3B MoE model, reducing activated experts from 8 to 4 while significantly improving inference speed without compromising model capability

Model Features

High-Speed Inference
Reduces activated experts to 4 (out of 128 total), nearly doubling inference speed
32K Long Context
Supports 32K context length + 8K output, totaling 40K processing capacity
Multi-Quantization Support
Supports GGUF, GPTQ, EXL2, AWQ, HQQ and other quantization formats
Efficient Resource Utilization
Activates only 1.5B parameters (of 30B total), delivering excellent performance on CPU/GPU

Model Capabilities

Long-text generation
Complex reasoning
Multi-turn dialogue
Code generation
Creative writing

Use Cases

Content Creation
Sci-Fi Story Writing
Generates emotionally rich short sci-fi stories
Example demonstrates complete 800-1000 word sci-fi story creation
Dialogue Systems
Deep-Thinking Dialogue
Displays AI reasoning process via <think> tags
Model can showcase detailed reasoning chains and inner monologues
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase