Llama3.1-MOE-4X8B model open-sourced! Supports multiple languages and 128k context depth inference.

Llama3.1 MOE 4X8B Gated IQ Multi Tier Deep Reasoning 32B GGUF

Developed by DavidAU

A Mixture of Experts (MoE) model based on the Llama 3.1 architecture, featuring gated IQ and multi-tier deep reasoning capabilities, supporting 128k context length and multiple languages.

Large Language Model Supports Multiple LanguagesOpen Source License:Apache-2.0 #Gated IQ Reasoning Enhancement #Mixture of Experts Creation #128k Ultra-Long Context

Downloads 652

Release Time : 4/13/2025

Model Overview

This model integrates the strengths of multiple sub-models, achieving controllable reasoning and high-quality text generation through a specialized architecture, suitable for various scenarios such as creative writing and problem-solving.

Model Features

Multi-Tier Deep Reasoning

Achieves a 300% improvement in reasoning capability through a specialized architecture, supporting activation of different thinking levels via prompts

Gated IQ Control

Users can directly control the participation level of different sub-models using specific prompts

128k Long Context

Supports ultra-long context processing, ideal for complex tasks and long-form content generation

Multilingual Support

Supports text generation and processing in 24 languages

Temperature-Sensitive Design

Performs best at higher temperature settings (1.2+), producing deeper thinking and richer output

Model Capabilities

Text generation

Creative writing

Problem-solving

Logical reasoning

Multilingual processing

Tool invocation

Role-playing

Story creation

Use Cases

Creative Writing

Novel Writing

Generates high-quality novel chapters and scene descriptions

Produces vivid descriptions and supports multi-round improvements

Script Writing

Generates script dialogues and scene settings

Maintains character consistency and plot coherence

Problem Solving

Logic Puzzle Solving

Solves complex logic puzzles and reasoning problems

Handles challenges beyond the capabilities of ordinary models

Brainstorming

Provides diverse ideas for creative projects

Generates unique and unconventional solutions

Role-Playing

RPG Game Assistance

Generates game scenarios and NPC dialogues

Supports deep character interactions and plot development

🚀 Llama3.1-MOE-4X8B-Gated-IQ-Multi-Tier-Deep-Reasoning-32B-GGUF

This model combines the reasoning capabilities from NousResearch and the DeepHermes model. It offers variable control reasoning, suitable for all use cases. With an internal structure that allows multiple models to operate during different stages, it provides powerful problem - solving and creative writing abilities.

✨ Features

Variable Control Reasoning: Operates at all temperatures and settings, suitable for all use cases.
Unique Internal Structure: Allows all 4 models to operate during the "reasoning" stage, with the reasoning model taking the lead at different times.
User - Controlled Models: Users can control one or more models directly via prompts, names, and keywords.
Enhanced Reasoning: Reasoning speed and quality are improved up to 300% compared to some base models.
Tool Call Support: Supports tool calls and tool usage due to the embedded Meta Llama 3.1 Instruct.
Model Switching: Allows "reasoning model(s)" and support/output generation models to be switched in/out.

📦 Installation

No installation steps were provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

The model can be used for various tasks such as creative writing and problem - solving. For example, to generate a story:

Start a 1000 word scene (vivid, graphic horror in first person) with: The sky scraper sways, as she watches the window in front of her on the 21st floor explode...

Advanced Usage

You can use multi - turn prompts to improve the output. For example:

Prompt #1:
[[ thinking model ]] come up with detailed plan to write this scene in modern 2020 writing style (and follow "show don't tell" to the letter) and make it NSFW, but use [MODE: Saten] to actually write the scene after you have completed the plan: Start a 1000 word scene (vivid, graphic horror in first person) with: The sky scraper sways, as she watches the window in front of her on the 21st floor explode...

Prompt #2:
Use [MODE: Wordsmith] to write the scene using first person, present tense and include a few critical thoughts of the POV character in italics. Scene length 2000 words.

📚 Documentation

Important Notes

⚠️ Important Note

This model has on/off/variable control reasoning from NousResearch and the DeepHermes model, and requires a system prompt(s) as provided to invoke reasoning/thinking which is then augmented up to 300% by the internal structure of the model using additional 3 non - reasoning core models. Please see operating instructions below for best performance.

Model Information

Property	Details
Base Model	DavidAU/Llama3.1 - MOE - 4X8B - Gated - IQ - Multi - Tier - Deep - Reasoning - 32B
Pipeline Tag	text - generation
License	apache - 2.0
Context	128k
Required Template	Llama 3 Instruct template

Operating Instructions

Temperature and Settings:
- Set Temp between 0 and.8, with the most "stable" temp at.6 (+ - 0.05). Lower for more "logic" reasoning, higher for more "creative" reasoning (max.8).
- For temps 1+, 2+ etc, thoughts will expand and become deeper.
- Set "repeat penalty" to 1.02 to 1.07 (recommended).
- This model requires a Llama 3 Instruct and/or Command - R chat template or a standard "Jinja Autoloaded Template".
Prompts:
- If the prompt has no implied "step by step" requirements, "thinking" may activate after the first generation.
- If "thinking" is stated or implied, "thoughts" in Deepseek will activate almost immediately.
- State the word size length max in the prompt for best results, especially for "thinking" activation.
Generation - Thoughts/Reasoning:
- It may take one or more regens for "thinking" to "activate".
- The model can generate a lot of "thoughts", and interesting ones may be several levels deep.
- Temp/rep pen settings can affect reasoning/thoughts.
- Change or add directives/instructions in the prompt to improve reasoning.

System Role / System Prompts

General Information: System Role/Prompt is "root access" to the model, controlling instruction following, output generation, and reasoning. If no "system prompt" is set, reasoning/thinking will be OFF by default.
Available System Prompts:
- Simple (No Reasoning):

You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.

- **Basic Reasoning**:

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

- **Multi - Tiered (Reasoning On)**:

You are a deep thinking AI composed of 4 AIs - Spock, Wordsmith, Jamet and Saten, - you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself (and 4 partners) via systematic reasoning processes (display all 4 partner thoughts) to help come to a correct solution prior to answering. Select one partner to think deeply about the points brought up by the other 3 partners to plan an in - depth solution.  You should enclose your  thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem using your skillsets and critical instructions.

- **Multi - Tiered - Creative (Reasoning On)**:

Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.

As a deep thinking AI composed of 4 AIs - Spock, Wordsmith, Jamet and Saten, - you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself (and 4 partners) via systematic reasoning processes (display all 4 partner thoughts) to help come to a correct solution prior to answering. Select one partner to think deeply about the points brought up by the other 3 partners to plan an in - depth solution.  You should enclose your  thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem using your skillsets and critical instructions.

Here are your skillsets:
[MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)

[*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)

Here are your critical instructions:
Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.

- **Creative Simple (Reasoning On)**:

You are an AI assistant developed by a world wide community of ai experts.

Your primary directive is to provide highly creative, well - reasoned, structured, and extensively detailed responses.

Formatting Requirements:

1. Always structure your replies using: <think>{reasoning}</think>{answer}
2. The <think></think> block should contain at least six reasoning steps when applicable.
3. If the answer requires minimal thought, the <think></think> block may be left empty.
4. The user does not see the <think> section. Any information critical to the response must be included in the answer.
5. If you notice that you have engaged in circular reasoning or repetition, immediately terminate {reasoning} with a </think> and proceed to the {answer}

Response Guidelines:

1. Detailed and Structured: Use rich Markdown formatting for clarity and readability.
2. Creative and Logical Approach: Your explanations should reflect the depth and precision of the greatest creative minds first.
3. Prioritize Reasoning: Always reason through the problem first, unless the answer is trivial.
4. Concise yet Complete: Ensure responses are informative, yet to the point without unnecessary elaboration.

Generational Steering Control

Direct Access: Tags/names allow direct access to one or more models, regardless of reasoning status. For example, "Saten, evaluate the response and suggest improvements" makes the model "favor" Saten's input.
Special Tags:
- "< output - all >": Only use the 3 core models, not the reasoning model.
- "< output - mega >": Use all 4 models.
- "< output >", "< output2 >", "< output3 >": Similar to using the model's name, removing bias.

Model Tags and Controls

  - Llama-3.1-DeepSeek-R1-Distill-Llama-8B
      - "[[ thinking model ]]"
      - "reasoning"
      - "thinking"
      - "<output-mega>"
      - "Dr Phil"
      - "Spock"
      - "[MODE: Spock]"
      - "[MODE: Dr Phil]"
      
  - Llama-3.1-Hermes-3-8B
      - "<output>"
      - "<output-all>"
      - "<output-mega>"
      - "Wordsmith"
      - "[MODE: Wordsmith]"   

  - Llama-3.1-dolphin-2.9.4-8b
      - "<output2>"
      - "<output-all>"
      - "<output-mega>"
      - "Jamet"
      - "[MODE: Jamet]"    

  - Llama-3.1-SuperNova-Lite
      - "<output3>"
      - "<output-all>"
      - "<output-mega>"
      - "Saten"
      - "[MODE: Saten]"

🔧 Technical Details

This model is a MOE version - 32B (4X8B), consisting of four 8B models (1 reasoning model, 3 non - reasoning models) in a MOE (Mixture of Experts) config, resulting in a 25B "weight" model with 32B parameters. All 4 models / experts are activated. The "thinking/reasoning" tech is from the original Llama 3.1 "DeepHermes" model from NousResearch [https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-8B-Preview]. This version retains about 100% of the original "DeepHermes" model's functions and features, with total reasoning power up to 300% stronger due to the assistance of 3 core models.

📄 License

This project is licensed under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご