Qwen3-30B-A1.5B-64K-High-Speed-NEO-Imatrix-MAX-gguf Open Source Model - Quickly Complete Various Text Generation Tasks

Qwen3 30B A1.5B 64K High Speed NEO Imatrix MAX Gguf

Developed by DavidAU

An optimized version based on the Qwen3-30B-A3B Mixture of Experts model, improving speed by reducing the number of active experts, supporting 64k context length, and suitable for various text generation tasks.

Large Language Model Supports Multiple Languages#64k ultra-long context #Mixture of Experts acceleration #Multilingual creative writing

Downloads 508

Release Time : 5/11/2025

Model Overview

This model is a fine-tuned version of Qwen3-30B-A3B, reducing the number of active experts from 8 to 4 (out of 128 total experts), significantly enhancing inference speed. Suitable for creative writing, problem-solving, deep reasoning, and various other text generation scenarios.

Model Features

High-speed inference

By reducing the number of active experts, inference speed is nearly doubled, achieving 23 tokens/sec on CPU and 125+ tokens/sec on GPU.

64k context

Context length extended to 64k (65535) via YARN technology, ideal for long-text tasks.

Multi-quantization version support

Offers multiple quantization versions (e.g., IQ1_M MAX, IQ2, Q4KS, etc.), suitable for different hardware environments (GPU/CPU).

Mixture of Experts architecture

The model employs a Mixture of Experts structure, automatically selecting the best experts to process input content, balancing performance and quality.

Model Capabilities

Text generation

Creative writing

Problem-solving

Deep reasoning

Role-playing

Story creation

Tool calling

Use Cases

Creative writing

Sci-fi story creation

Generate sci-fi story segments with specific themes and emotions.

Generated content is emotionally rich and thematically focused, as seen in the example segment 'The Last Transmission'.

Problem-solving

Deep reasoning tasks

Solve problems through systematic reasoning with long-chain thinking.

The model can generate detailed thought processes (wrapped in <think> tags) and provide final solutions.

🚀 Qwen3-30B-A1.5B-64K-High-Speed-NEO-Imatrix-MAX-gguf - AKA: "Punch IT!"

This model is a high - performance text - generation solution with a wide range of language support and unique features. It offers high - speed processing and a large context size, making it suitable for various use cases, from creative writing to problem - solving.

🚀 Quick Start

Model Information

Property	Details
Library Name	transformers
Supported Languages	en, fr, de, es, pt, it, ja, ko, ru, zh, ar, fa, id, ms, ne, pl, ro, sr, sv, tr, uk, vi, hi, bn
Pipeline Tag	text - generation
Tags	64k context, high speed, all use cases, creative, creative writing, all genres, tool calls, tool use, problem solving, deep thinking, reasoning, deep reasoning, story, writing, fiction, roleplaying, bfloat16, role play, sillytavern, backyard, Lmstudio, moe, mixture of experts, 4 experts activated, 128 experts, NEO Imatrix, Max Imatrix, qwen3
Base Model	DavidAU/Qwen3 - 30B - A1.5B - High - Speed

Special Note

⚠️ Important Note

All quants of this model can be used on GPU and/or CPU/RAM only due to unique construction of the model. There are also several versions of quant sizes with special features too.

✨ Features

High - Speed Performance: This is a simple "finetune" of the Qwen's "Qwen 30B - A3B" (MOE) model. By setting the experts in use from 8 to 4 (out of 128 experts), it nearly doubles the speed of the model and uses 1.5B (of 30B) parameters instead of 3B (of 30B) parameters.
Large Context Size: GGUF NEO Imatrix ggufs have extended the context to 64k (65535) (up from 32k/32768) context as per Qwen tech notes using "YARN".
Versatile Use Cases: Suitable for a wide range of applications, including creative writing, role - playing, and problem - solving.
Multiple Quant Versions: There are different quant versions with various performance characteristics. For example, IQ3s and higher quants show a large change in performance, with IQ4_XS/IQ4_NL being the peak for NEO Imatrix effect(s) and specific quality at this quant level.

💻 Usage Examples

Example Generation

Temp 1.2, rep pen 1.06, rep pen range 64, topk 100, topp.95, minp.05 Q4KS [non - imatrix], CPU only (windows 11), LMSTUDIO.

This is a mid - level quant, expect stronger performance from an Imatrix Q4KS / higher quants / full precision.

PROMPT

Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. If the situation calls for it, have the character(s) curse and swear to further the reader's emotional connection to them. 800 - 1000 words.

Thinking Process

Thought for 47.29 seconds

Okay, I need to write a science fiction story based on the given prompt. The user wants the story to take place entirely in a spaceship's cockpit, focusing on the last surviving crew member trying to send a final message before the ship's power runs out. Themes are isolation, sacrifice, and human connection. Also, include some cursing to add emotional weight.

First, I should start by setting the scene. The cockpit needs to feel cramped and desperate. Maybe describe the flickering lights and the hissing sounds of the ship's systems failing. The main character is alone, so I need to show their loneliness. Maybe they have a log or something to talk to themselves, but since it's the cockpit, perhaps they're just trying to get through the message.

Next, the themes. Isolation is straightforward—they're the only one left. But how to show sacrifice? Maybe they stayed behind to make sure the message gets through, knowing they won't make it. Human connection could come through flashbacks or memories, but since the story is in the cockpit, maybe they're remembering a loved one, like a child or partner, to keep them going.

I need to include some swearing. Maybe the protagonist curses when they realize something's not working, or when they're under stress. But it shouldn't be too over the top. Maybe they curse at the ship for failing them, or at themselves for not being able to fix it.

The structure: Start with the character working on the controls, trying to send the message. Then show their internal struggle. Maybe flashbacks? Wait, the user said the story should take place entirely within the cockpit. So maybe no flashbacks. Just the present moment, with internal monologue.

Conflict: The ship is failing, power is running out. They need to send the message before they lose power. Maybe there's a countdown or a limited time. Also, maybe the ship is damaged in a way that it's irreversible.

Character development: The protagonist is determined, maybe a bit stubborn. They care about relaying the message, which could be a warning about something, like a virus or an alien threat. But since the prompt doesn't specify, maybe it's more about the human aspect—like they're the last, and they want to make sure someone knows what happened.

Dialogue: Since it's the cockpit, maybe they talk to themselves, or to the ship. Or maybe there's a voice in the comms? Wait, but the prompt says the story is entirely within the cockpit. So no one else is there except the main character. So it's all internal, except for maybe a log entry.

Wait, the user said "the sole surviving crew member attempts to send a final message back to Earth." So the message is for Earth. So maybe the protagonist is trying to record a message, but the ship is failing. Or maybe they're trying to send a distress signal, but the systems are failing.

Let me outline:

Open with the cockpit's deteriorating condition. Maybe the life support is failing, or the power is out. The protagonist is trying to keep the comms on.
They're alone, so they talk to the ship, maybe. "Come on, you piece of crap, hold together."
Flashbacks aren't allowed, per the user's instruction (since it's all in the cockpit), so internal thoughts are key.
The message they need to send is important. Maybe a warning about something, but also their own fear.
The story should end with them managing to send the message, but knowing they won't survive, or maybe the message gets through just in time.

Including swearing: Maybe when they're frustrated with the systems not working, or when they realize the power is dying. "F***ing hell," "God damn it," etc.

Need to make sure the themes are present. Isolation: being the only one left. Sacrifice: giving their life to send the message. Human connection: maybe they think of someone before they die, or the act of sending the message is about connecting with others back home.

Possible structure:

Start in the cockpit, power failing, protagonist working frantically.
Flashbacks? Wait, no—since it's all in the cockpit, maybe not. But internal monologue can reflect on their past.
The protagonist's internal struggle: fear of death, but also duty to send the message.
Technical difficulties with the ship, leading to a climax where they get the message out, then succumb.

📚 Documentation

Model Versions

Master Repo: [https://huggingface.co/DavidAU/Qwen3 - 128k - 30B - A3B - NEO - MAX - Imatrix - gguf](https://huggingface.co/DavidAU/Qwen3 - 128k - 30B - A3B - NEO - MAX - Imatrix - gguf)
12 experts: [https://huggingface.co/DavidAU/Qwen3 - 30B - A4.5B - 12 - Cooks](https://huggingface.co/DavidAU/Qwen3 - 30B - A4.5B - 12 - Cooks)
16 experts: [https://huggingface.co/DavidAU/Qwen3 - 30B - A6B - 16 - Extreme](https://huggingface.co/DavidAU/Qwen3 - 30B - A6B - 16 - Extreme)
16 experts, 128k context: [https://huggingface.co/DavidAU/Qwen3 - 30B - A6B - 16 - Extreme - 128k - context](https://huggingface.co/DavidAU/Qwen3 - 30B - A6B - 16 - Extreme - 128k - context)
24 experts: [https://huggingface.co/DavidAU/Qwen3 - 30B - A7.5B - 24 - Grand - Brainstorm](https://huggingface.co/DavidAU/Qwen3 - 30B - A7.5B - 24 - Grand - Brainstorm)

System Role

You may or may not need this, as most times Qwen3s generate their own reasoning/thinking blocks.

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

See document "Maximizing - Model - Performance - All..." below for how to "set" system role in various LLM/AI apps below.

Performance Settings

If you are going to use this model, (source, GGUF or a different quant), please review this document for critical parameter, sampler and advance sampler settings (for multiple AI/LLM aps). [https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters](https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters)

This document details methods to enhance operations for all models, regardless of "model class". It also provides settings that can be used for any model, from any repo, all quants, and source code operation to enhance performance.

Performance Benchmarks

Cpu only operation Q4KS (windows 11) jumps from 12 t/s to 23 t/s.
GPU performance IQ3S jumps from 75 t/s to over 125 t/s. (low to mid - level card)

Quant Performance

IQ3s will outperform Q3s quants, likewise for IQ2s vs Q2s quants.
IQ4_XS / IQ4_NL will perform at or outperform Q4s.
IQ3_S / IQ3_M are very strong and can be used for most tasks.
All Imatrix versions of quants will outperform their non - imatrix counterparts.
"Q" quants will often be FASTER (token per second) than IQ quants (even if the "Q" quant is larger), especially if run on cpu/ram.
Q2_K / Q2K_S for (CPU/RAM only operation) will be faster (25% to 100%) than IQ2/IQ3 in some configs.
GPU speeds will be BLISTERING 4x - 8x or higher than CPU only speeds AND this model will be BLISTERING too, relative to other "30B" models (Token per second speed equal roughly to 1.5B "normal" model speeds).

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご