🚀 Qwen3-30B-A7.5B-24-Grand-Brainstorm
This repository contains a finetuned version of the Qwen model with 24 experts, suitable for complex use cases.
🚀 Quick Start
This repo contains the full precision source code in "safe tensors" format, which can be used to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly.
✨ Features
- Expert Configuration: This is a simple "finetune" of the Qwen's "Qwen 30B - A3B" (MOE) model. It sets the number of experts in use from 8 to 24 (out of 128 experts), using 7.5B (of 30B) parameters instead of 3B (of 30B) parameters.
- Context Size: It has a context size of 32K + 8K for output (40k total).
- Template Support: It supports the use of Jinja Template or CHATML template.
📚 Documentation
Model Usage
- Regular vs. Complex Use Cases: Regular or simpler use cases may benefit from using the normal (8 experts), the "12 cooks" (12 experts), 16 Extreme (32k or 128k context) or "High - Speed" (4 experts) version(s). Using 24 experts instead of the default 8 will slow down token/second speeds by about 2/3. This version with 24 experts may be overkill for some use cases.
- Temperature Suggestion: For some use cases, higher temperatures (1+, 2+) are suggested.
- Model Comparison: It is recommended to try the 4, 8 (reg), 12, and 16 versions as they will be faster in terms of tokens per second and may better meet your use cases.
Important Notes
- Quantization and Hardware: Due to the unique nature (MOE, Size, Activated experts, size of experts) of this model, GGUF quants can be run on the CPU, GPU or with GPU part "off - load", right up to full precision.
- Imatrix Difficulty: This model is difficult to Imatrix. You need a much larger imatrix file / multi - language / multi - content (i.e., code/text) to imatrix it.
- GPU Speed Advantage: GPU speeds will be 4x - 8x or higher than CPU - only speeds. And this model will also be very fast relative to other "30B" models (Token per second speed equal roughly to 7.5B "normal" model speeds).
Related Links
- Original Model Card: Please refer to the original model card for details, benchmarks, how to use, settings, system roles etc. [https://huggingface.co/Qwen/Qwen3 - 30B - A3B]
- More/Less Experts Versions:
- 4 experts: [https://huggingface.co/DavidAU/Qwen3 - 30B - A1.5B - High - Speed]
- 12 experts: [https://huggingface.co/DavidAU/Qwen3 - 30B - A4.5B - 12 - Cooks]
- 16 experts, 32k context: [https://huggingface.co/DavidAU/Qwen3 - 30B - A6B - 16 - Extreme]
- 16 experts, 128k context: [https://huggingface.co/DavidAU/Qwen3 - 30B - A6B - 16 - Extreme - 128k - context]
- Optimal Settings Guide: For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide, please see [https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters]. This document also details parameters, sampler and advanced samplers that can be used for any model, from any repo, all quants, and of course source code operation too, to enhance the operation of any model.
- Full Model Information: For full information about this model, including details, context limits, special usage notes/settings, models used to create it, templates used, example generation(s), and GGUF quants, please go to [GGUFS REPO coming soon].
Example Generation
Example Generation #1
- Settings: Temp 1.2, rep pen 1.06, rep pen range 64, topk 100, topp.95, minp.05
- Quant and Hardware: Q4KS [non - imatrix], CPU only (windows 11), LMSTUDIO.
- Speed: 8 T/S (no GPU offload)
Prompt:
Explain ways to use the "night" time cooling of radiant energy into space to reduce global temperatures.
Thought Process:
Thought for 6 minutes 18 seconds. The model considered various ways to enhance the Earth's ability to emit heat into space during nighttime to reduce global temperatures, including enhancing surface emissivity, reducing greenhouse gases, atmospheric engineering, using space - based cooling, etc.
Possible Ways:
- Enhancing emissivity of surfaces on Earth. For example, using materials with high emissivity (like black body) which can radiate more heat at night. If the surface is designed to emit more energy, then it cools down faster. This could reduce local temperatures and maybe contribute to global cooling.
- Reducing greenhouse gases in the atmosphere. But that's not specifically nighttime, but overall reduction. However, during nighttime, when there is no sunlight, the greenhouse gases might have a different effect.
- Using atmospheric engineering to create conditions that allow more heat to escape during night time. For example, maybe using reflective particles or altering cloud cover to let more radiation escape.
- Creating structures or systems that can actively cool by radiating heat during nighttime. Maybe like thermal emissive surfaces, which are designed to maximize the emission of heat into space.
- Using space - based cooling. Maybe if we can somehow reflect or redirect heat radiation into space more effectively. Perhaps using satellites or mirrors.
💡 Usage Tip
I suggest trying the 4, 8 (reg), 12, and 16 versions as these will be faster - token / second, and may meet your use case(s) better. Also, for some use cases, higher temps (1+, 2+) are recommended.
⚠️ Important Note
This model is difficult to Imatrix. You need a much larger imatrix file / multi - language / multi - content (i.e., code/text) to imatrix it. And make sure to review the document [https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters] for critical parameter, sampler and advance sampler settings.