đ Llama3.1-MOE-4X8B-Gated-IQ-Multi-Tier-COGITO-Deep-Reasoning-32B-GGUF
A powerful text-generation model with advanced reasoning capabilities suitable for various use cases.
đ License
This project is licensed under the Apache-2.0 license.
đ Tags
- All use cases
- Creative
- Creative writing
- All genres
- Tool calls
- Tool use
- Problem solving
- Deep thinking
- Reasoning
- Deep reasoning
- Story
- Writing
- Fiction
- Roleplaying
- bfloat16
- Role play
- sillytavern
- backyard
- llama 3.1
- context 128k
- mergekit
- merge
- moe
- mixture of experts
đ Quick Start
This model, "Gated-IQ-Multi-Tier-Cogito-Deep-Reasoning", is a variable control reasoning model that operates at all temps/settings and is suitable for all use cases. However, it has specific requirements and operating instructions for optimal performance.
⨠Features
- Variable Control Reasoning: It has on/off/variable control reasoning from Deepcogito (cogito-v1-preview-llama-8B) and requires a system prompt(s) to invoke reasoning/thinking. The internal structure of the model can augment reasoning up to 300% using additional 3 non-reasoning core models.
- Unique Internal Structure: All 4 models can operate during the "reasoning" stage, with the reasoning model taking the lead during this process and then switching during output generation.
- User Control: Additional internal structures allow the user to take control of one or more models directly via prompts, names, and keywords.
- High Reasoning Power: The total reasoning power of this model is far above the original 8B reasoning model, up to 300% stronger, thanks to the structure and the assistance of 3 core models working with "Cogito 8B".
- Tool Calls Support: As it has Meta Llama 3.1 Instruct embedded, it supports tool calls / tool usage.
- Model Switching and Expansion: The unique super structure allows "reasoning model(s)" to be switched in/out, and also supports the construction of larger and more powerful models.
đĻ Installation
This model requires a Llama 3 Instruct and/or Command - R chat template. It can also use the standard "Jinja Autoloaded Template" which is contained in the quant and will autoload.
đģ Usage Examples
Problem Solving and Reasoning
Prompt: Think up a detailed plan to solve this problem: How to build a sustainable community garden in a small backyard?
Creative Writing
Prompt: Start a 1000 - word scene (vivid, graphic horror in first person) with: The sky scraper sways, as she watches the window in front of her on the 21st floor explode...
đ Documentation
IMPORTANT OPERATING INSTRUCTIONS
- Temperature Preference: This is an instruct model that loves high temps (1.2+, 2.2+ etc). Lower temps will not produce the best content.
- Prompt Length: As an instruct model, it performs best with medium to long prompts. Longer prompts with some direction will show what the model can do.
- Reasoning Control: Reasoning is turned on/off via System Prompts.
- Context Requirement: A minimum context of 4k is suggested, but 8k is better due to reasoning/output blocks.
- Quantization: Larger quants mean better / stronger reasoning.
USE CASES
This model is suitable for all use cases, including solving logic puzzles, riddles, and other problems with its enhanced "thinking" systems. It can also solve problems beyond the abilities of a normal Llama 3.1 model due to DeepHermes systems.
Special Operation Instructions
- TEMP/SETTINGS
- Set Temp between 0 and .8. The most "stable" temp seems to be .6, with a variance of + - 0.05. Lower for more "logic" reasoning, raise it for more "creative" reasoning (max .8 or so). Also set context to at least 4096.
- For temps 1+, 2+ etc, thoughts will expand and become deeper and richer.
- Set "repeat penalty" to 1.02 to 1.07 (recommended).
- Use a Llama 3 Instruct and/or Command - R chat template or standard "Jinja Autoloaded Template".
- PROMPTS
- If a prompt without implied "step by step" requirements is entered, "thinking" may activate after the first generation.
- If a prompt with stated or implied "thinking" is entered, "thoughts" process in Deepseek will activate almost immediately. Sometimes regen is needed to activate.
- There will be a lot of variations in the output, including continuation of generation, suggestions for improvement, and characters "reasoning" about the situation.
- The model's "thoughts" may appear in the generation itself.
- State the word size length max in the prompt for best results, especially for activating "thinking".
- Try different temp settings (e.g., "default", 1.2, 2.5) to get a broad range of "reasoning/thoughts/problem" solving.
- GENERATION - THOUGHTS/REASONING
- It may take one or more regens for "thinking" to "activate".
- The model can generate a lot of "thoughts", and the most interesting ones may be several levels deep.
- "Thoughts" are often unique and different from one another.
- Temp/rep pen settings can affect reasoning/thoughts.
- Improve reasoning/thinking by changing or adding directives/instructions or increasing the detail level in the prompt.
- Adding "think outside the box", "brainstorm this idea" etc. to the prompt can improve results.
GENERAL SUGGESTIONS
- Open a "new chat" per prompt for better "thinking/reasoning activation", with temp .6 and rep pen 1.05, and then "regen" as required.
- Sometimes the model may go out of control, and manual stop may be needed.
- Depending on the AI app, "thoughts" may appear with "< THINK >" and "</ THINK >" tags or directly in the output.
- Higher quants provide better generation and more sound "reasoning/thinking".
GENERATIONAL STEERING CONTROL: "Programmer's Access - Direct Access to the AI(s)"
- Tags / names allow direct access to one or more models, regardless of whether reasoning is active or not. For example, "Saten, evaluate the response and suggest improvements" makes the model "favor" Saten's input.
- Special tags (remove spaces between the brackets):
- "< output - all >" -> only use the 3 core models, not the reasoning model.
- "< output - mega >" -> Use all 4 models.
- "< output >", "< output2 >", "< output3 >" -> Same as using the "name" of the model, just removes BIAS in the model's name.
đ§ Technical Details
The "thinking/reasoning" tech for this model is from the original Llama 3.1 "Cogito - V1" model from DeepCogito: [ https://huggingface.co/deepcogito/cogito - v1 - preview - llama - 8B ]. This version retains all the functions and features of the original reasoning model at about 100% of the original reasoning power.
This is the MOE version - 32B (4X8B) - consisting of four 8B models (1 reasoning model, 3 non - reasoning models) in a MOE (Mixture of Experts) config, which results in a 25B "weight" model with 32B parameters. All 4 models / experts are activated.