Qwen3-30B-A6B-16-Extreme-128k-context Open Source Model - Free Deployment to Support Complex Reasoning Scenarios

Qwen3 30B A6B 16 Extreme 128k Context

Developed by DavidAU

A fine-tuned version of the Qwen3-30B-A3B mixture of experts model, with activated experts increased to 16 and context window expanded to 128k, suitable for complex reasoning scenarios

Large Language Model

Transformers

#16-Expert Deep Reasoning #128k Ultra-Long Context #Mixture of Experts Architecture

Downloads 72

Release Time : 5/4/2025

Model Overview

This is a mixture of experts model optimized for deep reasoning, enhancing complex task handling capabilities by increasing the number of activated experts (16) and expanding the context window (128k), though it may reduce generation speed.

Model Features

16-Expert Mode

Increases activated experts from the standard 8 to 16 (total experts 128), enabling access to more specialized knowledge for handling complex prompts

128k Context

Expands maximum context window from 32k to 128k, suitable for processing long documents and complex reasoning chains

Flexible Quantization Support

Supports various quantization formats including GGUF, GPTQ, EXL2, AWQ, HQQ, and can run in CPU/GPU hybrid mode

Deep Reasoning Optimization

Implements systematic self-reasoning through chain-of-thought tags (<think></think>), ideal for tasks requiring in-depth analysis

Model Capabilities

Long Text Generation

Complex Reasoning

Multi-turn Dialogue

Chain-of-Thought Analysis

Multilingual Processing

Use Cases

Technology Analysis

Climate Engineering Solution Evaluation

Analyze the feasibility of using space radiation cooling effects to reduce global temperatures

Generate detailed reports including physical principles, implementation steps, and risk assessments

Creative Writing

Sci-Fi Microfiction Creation

Generate short sci-fi stories with emotional depth and philosophical reflection

800-word complete short story 'The Last Communication', exploring themes of loneliness and human connection

Romantic Scene Depiction

Create nuanced character interaction scenes

1000-word dialogue-based scene 'Encounter Under the Spotlight', showcasing complex character relationships

🚀 Qwen3-30B-A6B-16-Extreme-128k-context

This repository offers the full - precision source code in "safe tensors" format, enabling the generation of various model formats. It's a specialized fine - tuned version of the Qwen model, optimized for complex use - cases.

🚀 Quick Start

This repo contains the full precision source code, in "safe tensors" format. With this source code, you can generate various formats such as GGUFs, GPTQ, EXL2, AWQ, HQQ, and it can also be used directly.

✨ Features

Expert Configuration: This is a simple "finetune" of the Qwen's "Qwen 30B - A3B" (MOE) model. It sets the number of experts in use from 8 to 16 (out of 128 experts) and increases the max context to 128k (up from 32k) as per Qwen's tech notes.
Performance Trade - off: Using 16 experts instead of the default 8 slows down the token/second speeds by about 1/2. However, it uses more "experts" to process prompts and 6B (of 30B) parameters instead of 3B (of 30B) parameters, making it suitable for more complex use cases.
Context Size: The model supports a context size of 128k.
Template Support: It can use either the Jinja Template or the CHATML template.

📚 Documentation

Model Variants

Property	Details
Regular Model	"30B - A3B", suitable for regular or simpler use cases.
16 - Expert Model	This model, Qwen3 - 30B - A6B - 16 - Extreme - 128k - context, for more complex and nuanced use cases.
Other Variants	"12 cooks" (12 experts), "High - Speed" (4 experts) versions are also available for different performance requirements.

Important Notes

⚠️ Important Note

Due to the unique nature (MOE, Size, Activated experts, size of experts) of this model, GGUF quants can be run on the CPU, GPU, or with GPU part "off - load", right up to full precision.

This model is difficult to Imatrix. You need a much larger imatrix file / multi - language / multi - content (ie code/text) to imatrix it.

GPU speeds will be 4x - 8x or higher than CPU - only speeds, and this model will also be very fast relative to other "30B" models.

System Role

💡 Usage Tip

You may or may not need to set a system role as most times Qwen3s generate their own reasoning/thinking blocks. The suggested system role is:

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

Refer to the document "Maximizing - Model - Performance - All..." for how to set the system role in various LLM/AI apps.

Settings and Performance

For all settings used for this model (including specifics for its "class"), example generations, and advanced settings guide, please refer to [https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters]. This document details methods to enhance operations for all model classes and can be applied to any model, any repo, and any quant type.

Example Generations

At the bottom of the page, there are three example generations (Q4KS, CPU) using 16 experts with this model. Here is one of the examples:

Parameters: Temp 1.2, rep pen 1.06, rep pen range 64, topk 100, topp.95, minp.05
Environment: Q4KS [non - imatrix], CPU only (windows 11), LMSTUDIO
Speed: 11 T/S (no GPU offload). GPU speed would be 4x or higher.

Prompt and Response

Prompt: Explain ways to use the "night" time cooling of radiant energy into space to reduce global temperatures.

Response: The model thought for 2 minutes 17 seconds and considered multiple aspects such as the greenhouse effect, surface emissivity, cloud cover, and the role of water and evaporation. It listed possible methods to use nighttime cooling:

Enhancing the emissivity of surfaces during night, e.g., using high - emissivity materials in urban areas like cool roofs and pavements.
Reducing atmospheric greenhouse gases through mitigation efforts.
Geoengineering approaches, although some are more about reflecting sunlight rather than enhancing nighttime radiation.
Creating artificial surfaces with certain coatings or materials to emit radiation more effectively.
Enhancing the Earth's albedo during daytime to reduce heat absorption, which indirectly affects overall temperature.
Focusing on nighttime processes like increasing the emissivity of surfaces at night using appropriate materials.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご