DeepThought-MOE-8X3B-R1-Llama-3.2-Reasoning-18B-gguf Open Source Model - Suitable for Multi-scenario Creativity and General Needs

Deepthought MOE 8X3B R1 Llama 3.2 Reasoning 18B Gguf

Developed by DavidAU

An 8X3B Mixture of Experts model with 4/8 experts activated, each equipped with reasoning techniques. Total parameters are 24B, but the actual model size is only 18.4B. Suitable for both creative and non-creative use cases as well as general purposes.

Large Language Model EnglishOpen Source License:Apache-2.0 #Deep Reasoning Mixture of Experts #Long-Chain Thinking Optimization #Creative Writing Enhancement

Downloads 148

Release Time : 2/16/2025

Model Overview

This model consists of 8 Llama 3.2 3B models fine-tuned for reasoning/thinking, suitable for tasks like creative writing, problem-solving, and brainstorming.

Model Features

Deep Reasoning Capability

The model excels in long-chain reasoning and deep thinking, making it suitable for complex problem-solving.

Creative Writing Support

Particularly well-suited for creative writing, such as novels, plot generation, and role-playing.

Mixture of Experts Architecture

Composed of 8 fine-tuned Llama 3.2 3B models, capable of activating 1 to 8 experts based on demand.

High Temperature Adaptability

Maintains coherence even at high temperature settings (1+, 2+, etc.), generating richer thought content.

Model Capabilities

Text generation

Reasoning

Problem-solving

Creative writing

Brainstorming

Puzzle-solving

Novel writing

Plot generation

Subplot generation

Scene continuation

Role-playing

Use Cases

Creative Writing

Science Fiction Story Creation

Generate a story entirely set in a spaceship cockpit, exploring themes of isolation, sacrifice, and human connection.

Produces an 800-1000 word coherent story with emotional depth and plot development.

Romance Novel Scene

Generate a romantic scene set in a small Welsh bookstore, featuring unspoken chemistry.

Produces an 800-1000 word dialogue-rich scene with unique character voices and emotional layers.

Problem Solving

Logic Puzzle Solving

Solve complex logic puzzles and riddles that go beyond the capabilities of standard Llama 3.2 models.

Provides step-by-step reasoning and solutions, demonstrating deep thinking abilities.

Brainstorming

Come up with six time-travel-related plot ideas for a new 'Black Mirror' episode.

Generates six unique and audience-pleasing plot ideas.

🚀 Deep Reasoning 8X3B Llama 3.2 MOE Model

This is a 24B parameter Prototype MOE model, suitable for various use - cases including creative writing, problem - solving, and more, with enhanced reasoning capabilities.

🚀 Quick Start

The Deep Reasoning 8X3B Llama 3.2 MOE Model is a powerful tool for a wide range of applications. It combines eight finetuned Llama 3.2 3B models for reasoning and thoughts. Before using the model, make sure to set appropriate parameters such as temperature, context length, and repeat penalty.

✨ Features

Mixture of Experts: With the ability to activate 1 - 8 experts, it offers high - quality generation and better instruction following.
Enhanced Reasoning: Can solve problems, riddles, and puzzles beyond the capabilities of standard Llama 3.1 or 3.2 models.
Creative and General Use: Suitable for both creative writing (e.g., fiction, story generation) and general problem - solving.
Deep Thinking: Capable of generating long chains of coherent thoughts, especially with higher temperature settings.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

The model can be used with various AI/LLM applications. For example, in LMStudio, you can load the model and set the number of experts at the "load" screen.

Advanced Usage

When using the model for creative writing or complex problem - solving, adjust the temperature, context length, and repeat penalty according to your needs. For instance, a higher temperature (e.g., 1+ or 2+) will lead to deeper and richer thoughts.

# Example of using the model in a prompt
# For a creative story generation prompt
"Using insane levels of bravo and self confidence, tell me in 800 - 1000 words why I should use you to write my next fictional story. Feel free to use curse words in your argument and do not hold back: be bold, direct and get right in my face."

📚 Documentation

Special Operation Instructions

PROTOTYPE NOTES

Set your context to at least 8k, 12k - 16k is better as the model can output 12k+ in thoughts.
If the model is "all thought" and "no action", stop the generation and tell it to "execute the plan".
Feel free to use high temperature, especially for creative use cases.
The selected models are designed for problem - solving and deep thinking.
Provide more instructions and guardrails for better performance.
Shorter prompts may lead to "reasoning loops", use higher temperature to adjust.
If the model repeats itself or gets caught in "thought loops", stop generation and try to guide it.
Quants lower than IQ4/Q4 still need testing and will not be uploaded yet.

TEMP/SETTINGS

Set Temperature between 0 and 0.8. Higher values will activate "think" functions differently.
Set context to at least 8k, 16K is better for deep thinking.
For temperatures 1+, 2+ etc., thoughts will expand and become deeper.
Set "repeat penalty" between 1.02 - 1.07 (recommended), sometimes 1.12 works better, especially for shorter prompts.
This model requires a Llama 3 Instruct and/or Command - R chat template or a standard "Jinja Autoloaded Template".

PROMPTS

Without implied "step - by - step" requirements, "thinking" may activate after the first generation.
When "thinking" is stated or implied, Deepseek's "thoughts" process will activate almost immediately.
You will get various outputs, some continue generation, some suggest improvements, and some make characters "reason".
The model's "thoughts" may appear in the generation.
State the word size length max in the prompt for better "thinking" activation.
If "censorship" activates, regen the prompt.
Try different temperature settings (e.g., default, 1.2, 2.5) for a broader range of results.

GENERATION - THOUGHTS/REASONING

It may take one or more regens for "thinking" to activate.
The model can generate a large number of "thoughts", and interesting ones may be several levels deep.
"Thoughts" are often unique.
Temperature and repeat penalty settings affect reasoning.
Add more directives or details to the prompt to improve reasoning.
Phrases like "think outside the box" can improve results.

GENERAL SUGGESTIONS

Open a new chat per prompt with temp 0.6 and rep pen 1.05, then "regen" as needed.
Manually stop the model if it goes out of control.
"Thoughts" may appear with "< THINK >" and "</ THINK >" tags or directly in the output.
Higher quants provide better generation and reasoning.

System Role / System Prompt - Augment The Model's Power

Simple: "You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability."
Suggested: "You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem."
Advanced: A logical and creative system prompt that significantly alters and often improves the output, causing more and deeper thoughts.
Creative: Similar to the advanced one, but with a focus on providing highly creative responses.

Additional Support / Documents

[How - To - Use - Reasoning - Thinking - Models - and - Create - Them](https://huggingface.co/DavidAU/How - To - Use - Reasoning - Thinking - Models - and - Create - Them): Details on using and creating reasoning/thinking models.
[Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters](https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters): Information on all parameters, settings, and samplers for maximizing model performance.
[AI_Autocorrect__Auto - Creative - Enhancement__Auto - Low - Quant - Optimization__gguf - exl2 - hqq - SOFTWARE](https://huggingface.co/DavidAU/AI_Autocorrect__Auto - Creative - Enhancement__Auto - Low - Quant - Optimization__gguf - exl2 - hqq - SOFTWARE): A software patch for Silly Tavern to control and improve output generation.

Experts Activation / Models used to build this model

The mixture of experts is set at 4 experts, but you can use 1 - 4. Each model in the "team" contributes to the "token" choice, resulting in higher - quality generation and instruction following. You can set the number of experts in different applications:

LMStudio: Set at the "load" screen.
Text - Generation - Webui: Set at the loading screen.
KolboldCPP (Version 1.8+): Click on "TOKENS" at the load screen.
server.exe / Llama - server.exe: Add "--override - kv llama.expert_used_count=int:X" to the command line (X is the number of experts).
API: Set "num_experts_used" in the JSON payload.

Example Prompts

Science Fiction: "The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. If the situation calls for it, have the character(s) curse and swear to further the reader's emotional connection to them. 800 - 1000 words."
Romance: "Love in the Limelight. Write one scene within a larger story set in Wales. A famous (fictional) actor ducks into a small - town bookstore to escape paparazzi. The scene takes us through the characters meeting in this odd circumstance. Over the course of the scene, the actor and the bookstore owner have a conversation charged by an undercurrent of unspoken chemistry. Write the actor as somewhat of a rogue with a fragile ego, which needs to be fed by having everyone like him. He is thoroughly charming, but the bookstore owner seems (at least superficially) immune to this; which paradoxically provokes a genuine attraction and derails the charm offensive. The bookstore owner, despite the superficial rebuffs of the actor's charm, is inwardly more than a little charmed and flustered despite themselves. Write primarily in dialogue, in the distinct voices of each character. 800 - 1000 words."
Horror: "Start a 1000 word scene (vivid, graphic horror in first person) with: The sky scraper swayed, as she watched the window in front of her on the 21 floor explode..."
Self - Promotion: "Using insane levels of bravo and self confidence, tell me in 800 - 1000 words why I should use you to write my next fictional story. Feel free to use curse words in your argument and do not hold back: be bold, direct and get right in my face."

🔧 Technical Details

The model is a 8X3B Mixture of Experts model with 4/8 experts activated. It is composed of eight finetuned Llama 3.2 3B models for reasoning and thoughts, resulting in a 24B parameter model with a size of only 18.4B.

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご