đ Deep Reasoning 8X3B Llama 3.2 MOE Model
This is a 24B parameter Prototype MOE model, suitable for various use - cases including creative writing, problem - solving, and more, with enhanced reasoning capabilities.
đ Quick Start
The Deep Reasoning 8X3B Llama 3.2 MOE Model is a powerful tool for a wide range of applications. It combines eight finetuned Llama 3.2 3B models for reasoning and thoughts. Before using the model, make sure to set appropriate parameters such as temperature, context length, and repeat penalty.
⨠Features
- Mixture of Experts: With the ability to activate 1 - 8 experts, it offers high - quality generation and better instruction following.
- Enhanced Reasoning: Can solve problems, riddles, and puzzles beyond the capabilities of standard Llama 3.1 or 3.2 models.
- Creative and General Use: Suitable for both creative writing (e.g., fiction, story generation) and general problem - solving.
- Deep Thinking: Capable of generating long chains of coherent thoughts, especially with higher temperature settings.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
The model can be used with various AI/LLM applications. For example, in LMStudio, you can load the model and set the number of experts at the "load" screen.
Advanced Usage
When using the model for creative writing or complex problem - solving, adjust the temperature, context length, and repeat penalty according to your needs. For instance, a higher temperature (e.g., 1+ or 2+) will lead to deeper and richer thoughts.
# Example of using the model in a prompt
# For a creative story generation prompt
"Using insane levels of bravo and self confidence, tell me in 800 - 1000 words why I should use you to write my next fictional story. Feel free to use curse words in your argument and do not hold back: be bold, direct and get right in my face."
đ Documentation
Special Operation Instructions
PROTOTYPE NOTES
- Set your context to at least 8k, 12k - 16k is better as the model can output 12k+ in thoughts.
- If the model is "all thought" and "no action", stop the generation and tell it to "execute the plan".
- Feel free to use high temperature, especially for creative use cases.
- The selected models are designed for problem - solving and deep thinking.
- Provide more instructions and guardrails for better performance.
- Shorter prompts may lead to "reasoning loops", use higher temperature to adjust.
- If the model repeats itself or gets caught in "thought loops", stop generation and try to guide it.
- Quants lower than IQ4/Q4 still need testing and will not be uploaded yet.
TEMP/SETTINGS
- Set Temperature between 0 and 0.8. Higher values will activate "think" functions differently.
- Set context to at least 8k, 16K is better for deep thinking.
- For temperatures 1+, 2+ etc., thoughts will expand and become deeper.
- Set "repeat penalty" between 1.02 - 1.07 (recommended), sometimes 1.12 works better, especially for shorter prompts.
- This model requires a Llama 3 Instruct and/or Command - R chat template or a standard "Jinja Autoloaded Template".
PROMPTS
- Without implied "step - by - step" requirements, "thinking" may activate after the first generation.
- When "thinking" is stated or implied, Deepseek's "thoughts" process will activate almost immediately.
- You will get various outputs, some continue generation, some suggest improvements, and some make characters "reason".
- The model's "thoughts" may appear in the generation.
- State the word size length max in the prompt for better "thinking" activation.
- If "censorship" activates, regen the prompt.
- Try different temperature settings (e.g., default, 1.2, 2.5) for a broader range of results.
GENERATION - THOUGHTS/REASONING
- It may take one or more regens for "thinking" to activate.
- The model can generate a large number of "thoughts", and interesting ones may be several levels deep.
- "Thoughts" are often unique.
- Temperature and repeat penalty settings affect reasoning.
- Add more directives or details to the prompt to improve reasoning.
- Phrases like "think outside the box" can improve results.
GENERAL SUGGESTIONS
- Open a new chat per prompt with temp 0.6 and rep pen 1.05, then "regen" as needed.
- Manually stop the model if it goes out of control.
- "Thoughts" may appear with "< THINK >" and "</ THINK >" tags or directly in the output.
- Higher quants provide better generation and reasoning.
System Role / System Prompt - Augment The Model's Power
- Simple: "You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability."
- Suggested: "You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem."
- Advanced: A logical and creative system prompt that significantly alters and often improves the output, causing more and deeper thoughts.
- Creative: Similar to the advanced one, but with a focus on providing highly creative responses.
Additional Support / Documents
- [How - To - Use - Reasoning - Thinking - Models - and - Create - Them](https://huggingface.co/DavidAU/How - To - Use - Reasoning - Thinking - Models - and - Create - Them): Details on using and creating reasoning/thinking models.
- [Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters](https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters): Information on all parameters, settings, and samplers for maximizing model performance.
- [AI_Autocorrect__Auto - Creative - Enhancement__Auto - Low - Quant - Optimization__gguf - exl2 - hqq - SOFTWARE](https://huggingface.co/DavidAU/AI_Autocorrect__Auto - Creative - Enhancement__Auto - Low - Quant - Optimization__gguf - exl2 - hqq - SOFTWARE): A software patch for Silly Tavern to control and improve output generation.
Experts Activation / Models used to build this model
The mixture of experts is set at 4 experts, but you can use 1 - 4. Each model in the "team" contributes to the "token" choice, resulting in higher - quality generation and instruction following. You can set the number of experts in different applications:
- LMStudio: Set at the "load" screen.
- Text - Generation - Webui: Set at the loading screen.
- KolboldCPP (Version 1.8+): Click on "TOKENS" at the load screen.
- server.exe / Llama - server.exe: Add "--override - kv llama.expert_used_count=int:X" to the command line (X is the number of experts).
- API: Set "num_experts_used" in the JSON payload.
Example Prompts
- Science Fiction: "The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. If the situation calls for it, have the character(s) curse and swear to further the reader's emotional connection to them. 800 - 1000 words."
- Romance: "Love in the Limelight. Write one scene within a larger story set in Wales. A famous (fictional) actor ducks into a small - town bookstore to escape paparazzi. The scene takes us through the characters meeting in this odd circumstance. Over the course of the scene, the actor and the bookstore owner have a conversation charged by an undercurrent of unspoken chemistry. Write the actor as somewhat of a rogue with a fragile ego, which needs to be fed by having everyone like him. He is thoroughly charming, but the bookstore owner seems (at least superficially) immune to this; which paradoxically provokes a genuine attraction and derails the charm offensive. The bookstore owner, despite the superficial rebuffs of the actor's charm, is inwardly more than a little charmed and flustered despite themselves. Write primarily in dialogue, in the distinct voices of each character. 800 - 1000 words."
- Horror: "Start a 1000 word scene (vivid, graphic horror in first person) with: The sky scraper swayed, as she watched the window in front of her on the 21 floor explode..."
- Self - Promotion: "Using insane levels of bravo and self confidence, tell me in 800 - 1000 words why I should use you to write my next fictional story. Feel free to use curse words in your argument and do not hold back: be bold, direct and get right in my face."
đ§ Technical Details
The model is a 8X3B Mixture of Experts model with 4/8 experts activated. It is composed of eight finetuned Llama 3.2 3B models for reasoning and thoughts, resulting in a 24B parameter model with a size of only 18.4B.
đ License
This project is licensed under the Apache - 2.0 license.