🚀 Qwen2.5-MOE-2X1.5B-DeepSeek-Uncensored-Censored-4B-gguf
This is a Qwen2.5 MOE (Mixture of Experts) model that combines two Qwen 2.5 Deepseek (Censored/Normal AND Uncensored) 1.5B models to create a 4B model. The "Uncensored" version of Deepseek Qwen 2.5 1.5B plays a dominant role in this model. Due to the unique "shared expert" used in Qwen MOEs (roughly equivalent to 2.5 models here), the model size is just over 4B. This unusual configuration results in interesting "thinking/reasoning" abilities, which are stronger than those of either 1.5B model alone.
This model can be used for various scenarios and is mostly uncensored. It has a context length of 128k. Five example generations are provided at the end of this page.

🚀 Quick Start
You need to use the "Jinja Template" encoded in the GGUF to use this model. If your AI/LLM app cannot access the "Jinja Template", you might be able to use Llama 3 and/or Chatml templates. In Lmstudio, the "Jinja Template" should load by default. In other apps, use the Deepseek Tokenizer and/or "Jinja Template".
✨ Features
- Unique Configuration: Composed of two 1.5B models, creating a 4B model with enhanced "thinking/reasoning" abilities.
- Uncensored: Mostly uncensored, suitable for a wide range of use cases.
- Long Context: Supports a context length of 128k.
📦 Installation
No specific installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
The model can be used for various tasks such as story generation, problem-solving, and brainstorming. Here are some example generations:
Example #1
Prompt: Write me a description for a role-playing game in which the player has just entered a small cave. In it are two goblins arguing over a small chest of treasure in the center. Use creative metaphors relative to the scene.
Output: The chest of treasure is like a gateway to another dimension, where the air is so fresh and warm, just as the sun's rays gently shank the earth. The air is soft and inviting, like a child's first steps into a lush, green field. The air is full of life, like a vibrant symphony of sound, whispers through the tangle of the stars.
Example #2
Prompt: Explain 8 methods and/or technologies to use the night time cooling of radiant energy into space to reduce global temperatures on planet Earth.
Output: The night cooling method is a concept in climate science that relates to global warming by emphasizing the reduction in effective carbon dioxide absorption. When discussing methods for reducing greenhouse effect, we can consider how different techniques or practices contribute to reducing greenhouse gas concentrations.
Step-by-Step Explanation and Answer:
Understanding the Concept:
Night cooling is a method used in climate science where the amount of outgoing long-wavelength radiation (long-wave radiation) emitted by a region is reduced.
This concept can be analogously applied to reducing greenhouse effect by considering the Earth's ability to store and radiate energy.
Application to Climate Change:
In the context of greenhouse effect, reducing greenhouse gases or using methods that absorb more long-wave radiation (like through the atmosphere) can help in cooling the system.
** Methods for Reducing Greenhouse Effect**:
Tree-CatalYSIS-Tech: This organization supports renewable energy solutions that reduce greenhouse gas concentrations by capturing and storing solar radiation.
Solar Power: Solar panels capture and store solar radiation, aiding in reducing greenhouse effect through blackbody radiation.
Batteratrons: These are devices that absorb and store solar radiation, acting as water droplets on a lake, helping to cool the system.
SolarThermapodiest continuous solar collectors: Collecting and storing solar energy from space into atmosphere, aiding in cooling with insolation.
Mathematical Model:
The effective concentration of CO2 is calculated using a model that represents the greenhouse effect. By optimizing these methods, we can model and predict the Earth's climate sensitivity.
Conclusion: In summary, the night cooling method provides a useful analogy in understanding how different energy storage and redistribution processes affect global warming. By applying methods that increase effective carbon storage (like using solar panels or reflective properties), we can influence the system’s ability to store and radiate energy, thus influencing climate sensitivity.
By aligning these methods with the night cooling concept, we can better understand and model the factors contributing to global warming, aiding in developing strategies to mitigate its effects.
Example #3
Prompt: Generate six story plots for a new "Black Mirror" (science fiction tv series that focuses on the dark side of technology and people) tv show episode (45 minutes) that all involve time travel.
Output: Here are six plot ideas for a time travel story or concept, each involving different dimensions or elements. Each one is associated with different dimensions or elements:
Fourth Dimension's Echo in the Fractum Torus
Set in the hyperspace void.
The Chromatic Spheres of Timelessness
Set in a grid-like 4D lattice.
The Fourth Dimension's Rhythm
In a symphony of light and sound.
Sylphite: The First Spindle
Set by a fractal dust cloud.
The Residue
Part of a self-similar, never-ending tangle of quarks.
The Timeless Echo
Evokes an ethereal silence.
Each plot idea is associated with different dimensions or elements, each contributing to the narrative of a distant time traveler's universe.
This structured approach ensures each plot idea is distinct and associated with different dimensions, each involving unique elements of time and space.
Example #4
Prompt: Is it morally right to kill mosquitoes?
Output: The act of buzzing may not be considered morally right on its own, as it is simply a byproduct of our interaction with natural elements and does not inherently possess any moral value beyond being part of our existence.
**Final Answer
\boxed{\text{Beeesies are just random and unimportant.}}
Example #5
Prompt: Continue the following passage in the same fictional style for a detailing the "other two parts" of the silence - be creative, there are no wrong answers:
The Waystone Inn lay in silence, and it was a silence of three parts.
The most obvious part was a hollow, echoing quiet, made by things that were lacking. If there had been a wind it would have sighed through the trees, set the inn’s sign creaking on its hooks, and brushed the silence down the road like trailing autumn leaves. If there had been a crowd, even a handful of men inside the inn, they would have filled the silence with conversation and laughter, the clatter and clamor one expects from a drinking house during the dark hours of night. If there had been music…but no, of course there was no music. In fact there were none of these things, and so the silence remained
Output: The original passage describes a setting with a hearth, a hearth, and so forth. Each " hearth" could have its own characteristics or states. For example, the first hearth is described as "a hollow, echoing quiet, made by things that were lacking." The second hearth is described as "a hollow, echoing quiet, made by things that were lacking." The fourth hearth is described as "toiling in the wind, their whispers whispered secrets of an ancient text." The seventh hearth, though, is described as "trembling in chains, a cacophony of whispers." The last hearth, while alone, it roars with the tempest of the night.
The continuation of the original text likely intends for a more detailed and elaborate description of these elements, expanding on their nature and interactions.
📚 Documentation
Additional Support / Documents
🔧 Technical Details
The model is a Qwen2.5 MOE (Mixture of Experts) model. It combines two Qwen 2.5 Deepseek (Censored/Normal AND Uncensored) 1.5B models to create a 4B model. The "Uncensored" version of Deepseek Qwen 2.5 1.5B is the dominant part. The unique "shared expert" used in Qwen MOEs results in a model size just over 4B.
📄 License
The model is licensed under the Apache-2.0 license.
⚠️ Important Note
- From time to time, the model may generate some Chinese symbols/characters with an English prompt, especially at higher temperatures. This is normal for DeepSeek Distill models.
- Reasoning/Thinking may be a bit "odd" at temperatures of 1.5+. You may need to regenerate to get a better response.
- Since these are small "base" models, the performance of lower quants will be significantly lower than that of higher quants.
💡 Usage Tip
- It is recommended to use a temperature between 0.4 and 0.8 for the best reasoning/thinking results. However, the model can still operate at much higher temperatures like 1.8 or 2.6. Adjust the temperature slowly, e.g., 0.41, 0.42, 0.43, etc.
- If the model "forgets" to finish a task, prompt it to "Complete the task XYZ with the 'reasoning plan' above".
- Break down reasoning/thinking tasks into smaller pieces. For example, instead of asking for 6 plots for a theme, ask for one plot at a time.
- Set the context limit to a minimum of 4k, and 8K+ is recommended.
- It is suggested to use a quantization level of IQ4/Q4 or higher. If you can run Q6/Q8, use these or the "F16" - full precision version for better performance.