🚀 AngelSlayer-12B-Unslop-Mell-RPMax-DARKNESS
They say ‘He’ will bring the apocalypse. She seeks understanding, not destruction.
This is a merged pre - trained language model crafted using mergekit. It's the fourth model created by the author, aiming to test the della_linear method. The core idea behind this model is to leverage the negative characteristics of DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS to counter potential positivity bias while maintaining stability.
🚀 Quick Start
This model is a result of merging multiple pre - trained language models. For a quick start, users can refer to the details below for a better understanding of its usage and performance.
✨ Features
- Context Handling: The model demonstrates excellent performance in handling context, closely adhering to the character and prompt provided.
- Prose Quality: It generates expansive and diverse prose, largely free from GPT - like patterns.
- Error Predictability: Errors in the model's output are somewhat predictable. For example, if it misspells a user's name in the first instance, subsequent instances may be fixed automatically.
- Low Repetition: Repetition in the output is relatively low, and the DRY feature can be activated if repetition occurs.
📦 Installation
No specific installation steps are provided in the original document.
💻 Usage Examples
No code examples are provided in the original document.
📚 Documentation
Testing Stage
(18/12/2024): The model shows great performance in handling context and sticking to the character/prompt. Its prose is expansive and varied, with few GPTisms. However, it tends to interpret inputs in a similar way, likely due to self_attn layers. As a result, the output often follows a certain theme or direction, although the wording may vary. Errors are predictable, and the model can sometimes correct itself. Repetition is low, and DRY can be used if needed. A Higher Temperature (1.25) seems to work better, and XTC can significantly improve the output without reducing intelligence.
EDIT: The issue of similar output themes might be related to inflatebot/MN-12B-Mag-Mell-R1. The author plans to adjust the model weights or experiment with different merge methods using the base models of inflatebot/MN-12B-Mag-Mell-R1 to address this problem.
Parameters
Property |
Details |
Context size |
Not more than 20k is recommended, as coherency may degrade. |
Chat Template |
ChatML |
Samplers |
A Temperature - Last of 1 - 1.25 and Min - P of 0.1 - 0.25 are viable but not finetuned. Activate DRY if repetition appears. XTC seems to work well. |
Quantization
Merge Details
Merge Method
This model was merged using the della_linear merge method, with TheDrummer/UnslopNemo-12B-v4.1 as the base model.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
models:
- model: TheDrummer/UnslopNemo-12B-v4.1
parameters:
weight: 0.25
density: 0.6
- model: ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2
parameters:
weight: 0.25
density: 0.6
- model: DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS
parameters:
weight: 0.2
density: 0.4
- model: inflatebot/MN-12B-Mag-Mell-R1
parameters:
weight: 0.30
density: 0.7
base_model: TheDrummer/UnslopNemo-12B-v4.1
merge_method: della_linear
dtype: bfloat16
chat_template: "chatml"
tokenizer_source: union
parameters:
normalize: false
int8_mask: true
epsilon: 0.05
lambda: 1
🔧 Technical Details
The model uses the della_linear merge method. The order of models in the DELLA - Linear configuration matters, where 'lower' models in the config hold more prevalence. The model also has specific parameters for context size, chat template, and samplers, which are crucial for its performance.
📄 License
The model is released under the apache - 2.0 license.
Today we hustle, 'day we hustle but tonight we play.