🚀 Qwen3-4B-NEO-Imatrix-Max-GGUF
This project offers NEO Imatrix quants of the new "Qwen 3 - 4B" model with a MAX "output tensor" at BF16 to enhance reasoning and output generation.
✨ Features
- NEO Imatrix Quants: The NEO Imatrix dataset is generated in - house. Using lower quants like IQ4XS/IQ4NL can strengthen the Imatrix effect, and they are well - balanced for quality and the Imatrix effect. These quants are also suitable for creative use cases. Higher quants are recommended for stronger reasoning. The Q8_0 quant is maxed only as the Imatrix has no effect on it, and F16 is full precision.
- Long Context Length: It supports a context length of 32K + 8K output generation, which can be extended to 128k. For 65k, 128k, or 256K context with the 4B model, refer to [ https://huggingface.co/DavidAU/Qwen3-4B-Q8_0-65k-128k-256k-context-GGUF ].
📚 Documentation
Jinja Template
- If you encounter issues with the Jinja "auto template", use the CHATML template.
- For LMSTUDIO users, update the Jinja Template by going to [ https://lmstudio.ai/neil/qwen3 - thinking ], copying the "Jinja template", and then pasting it.
System Role
- You may or may not need the system role as Qwen3s often generate their own reasoning/thinking blocks. The suggested system role is:
You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.
Refer to the document "Maximizing - Model - Performance - All..." for how to "set" the system role in various LLM/AI apps.
Highest Quality Settings
For all settings used for this model (including specifics for its "class"), example generation, and advanced settings guide, which can address model issues and improve performance for all use cases such as chat, role - play, etc., please see [ https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters ].
Optional Enhancement
The following can be used instead of the "system prompt" or "system role" to enhance the model. It can also be used at the start of a new chat, but make sure to keep it as the chat progresses.
Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.
Here are your skillsets:
[MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)
[*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)
Here are your critical instructions:
Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.
This enhancement seems to help with scene generation and scene continuation functions.
Another system prompt you can use is:
You are a deep thinking AI composed of 4 AIs - [MODE: Spock], [MODE: Wordsmith], [MODE: Jamet] and [MODE: Saten], - you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself (and 4 partners) via systematic reasoning processes (display all 4 partner thoughts) to help come to a correct solution prior to answering. Select one partner to think deeply about the points brought up by the other 3 partners to plan an in - depth solution. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.
Other Notes
- Reasoning is ON by default in this model, and the model will auto - generate "think" block(s).
- For benchmarks, usage info, and settings, please see the original model card at [ https://huggingface.co/Qwen/Qwen3-4B ].
📄 License
This project is licensed under the Apache - 2.0 license.