🚀 Qwen3-30B-A6B-16-Extreme-128k-context
This repository offers the full - precision source code in "safe tensors" format, enabling the generation of various model formats. It's a specialized fine - tuned version of the Qwen model, optimized for complex use - cases.
🚀 Quick Start
This repo contains the full precision source code, in "safe tensors" format. With this source code, you can generate various formats such as GGUFs, GPTQ, EXL2, AWQ, HQQ, and it can also be used directly.
✨ Features
- Expert Configuration: This is a simple "finetune" of the Qwen's "Qwen 30B - A3B" (MOE) model. It sets the number of experts in use from 8 to 16 (out of 128 experts) and increases the max context to 128k (up from 32k) as per Qwen's tech notes.
- Performance Trade - off: Using 16 experts instead of the default 8 slows down the token/second speeds by about 1/2. However, it uses more "experts" to process prompts and 6B (of 30B) parameters instead of 3B (of 30B) parameters, making it suitable for more complex use cases.
- Context Size: The model supports a context size of 128k.
- Template Support: It can use either the Jinja Template or the CHATML template.
📚 Documentation
Model Variants
Property |
Details |
Regular Model |
"30B - A3B", suitable for regular or simpler use cases. |
16 - Expert Model |
This model, Qwen3 - 30B - A6B - 16 - Extreme - 128k - context, for more complex and nuanced use cases. |
Other Variants |
"12 cooks" (12 experts), "High - Speed" (4 experts) versions are also available for different performance requirements. |
Important Notes
⚠️ Important Note
- Due to the unique nature (MOE, Size, Activated experts, size of experts) of this model, GGUF quants can be run on the CPU, GPU, or with GPU part "off - load", right up to full precision.
- This model is difficult to Imatrix. You need a much larger imatrix file / multi - language / multi - content (ie code/text) to imatrix it.
- GPU speeds will be 4x - 8x or higher than CPU - only speeds, and this model will also be very fast relative to other "30B" models.
System Role
💡 Usage Tip
You may or may not need to set a system role as most times Qwen3s generate their own reasoning/thinking blocks. The suggested system role is:
You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.
Refer to the document "Maximizing - Model - Performance - All..." for how to set the system role in various LLM/AI apps.
Settings and Performance
For all settings used for this model (including specifics for its "class"), example generations, and advanced settings guide, please refer to [https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters]. This document details methods to enhance operations for all model classes and can be applied to any model, any repo, and any quant type.
Example Generations
At the bottom of the page, there are three example generations (Q4KS, CPU) using 16 experts with this model. Here is one of the examples:
- Parameters: Temp 1.2, rep pen 1.06, rep pen range 64, topk 100, topp.95, minp.05
- Environment: Q4KS [non - imatrix], CPU only (windows 11), LMSTUDIO
- Speed: 11 T/S (no GPU offload). GPU speed would be 4x or higher.
Prompt and Response
Prompt:
Explain ways to use the "night" time cooling of radiant energy into space to reduce global temperatures.
Response:
The model thought for 2 minutes 17 seconds and considered multiple aspects such as the greenhouse effect, surface emissivity, cloud cover, and the role of water and evaporation. It listed possible methods to use nighttime cooling:
- Enhancing the emissivity of surfaces during night, e.g., using high - emissivity materials in urban areas like cool roofs and pavements.
- Reducing atmospheric greenhouse gases through mitigation efforts.
- Geoengineering approaches, although some are more about reflecting sunlight rather than enhancing nighttime radiation.
- Creating artificial surfaces with certain coatings or materials to emit radiation more effectively.
- Enhancing the Earth's albedo during daytime to reduce heat absorption, which indirectly affects overall temperature.
- Focusing on nighttime processes like increasing the emissivity of surfaces at night using appropriate materials.