đ Badger Î Llama 3 8B Instruct
Badger is a model that combines multiple other models through a special interpolation method. It aims to provide better text - generation performance and can be used in various text - related tasks.
⨠Features
- Badger is a recursive maximally pairwise disjoint normalized denoised fourier interpolation of multiple models, which combines the advantages of different models.
- It uses the Llama3 Instruct format, making it compatible with relevant applications.
- It shows positive results in ablation, although the responses may be short and a bit stiff or sloppy.
đ Documentation
Model Composition
Badger is a recursive maximally pairwise disjoint normalized denoised fourier interpolation of the following models:
models = [
'Einstein-v6.1-Llama3-8B',
'openchat-3.6-8b-20240522',
'hyperdrive-l3-8b-s3',
'L3-TheSpice-8b-v0.8.3',
'LLaMA3-iterative-DPO-final',
'JSL-MedLlama-3-8B-v9',
'Jamet-8B-L3-MK.V-Blackroot',
'French-Alpaca-Llama3-8B-Instruct-v1.0',
'LLaMAntino-3-ANITA-8B-Inst-DPO-ITA',
'Llama-3-8B-Instruct-Gradient-4194k',
'Roleplay-Llama-3-8B',
'L3-8B-Stheno-v3.2',
'llama-3-wissenschaft-8B-v2',
'opus-v1.2-llama-3-8b-instruct-run3.5-epoch2.5',
'Configurable-Llama-3-8B-v0.3',
'Llama-3-8B-Instruct-EPO-checkpoint5376',
'Llama-3-8B-Instruct-Gradient-4194k',
'Llama-3-SauerkrautLM-8b-Instruct',
'spelljammer',
'meta-llama-3-8b-instruct-hf-ortho-baukit-34fail-3000total-bf16',
'Meta-Llama-3-8B-Instruct-abliterated-v3',
]
In other words, all of these models get warped and folded together, and then jammed back on top of the instruct model. The Meta-Llama-3-8B-Instruct-abliterated-v3 and meta-llama-3-8b-instruct-hf-ortho-baukit-34fail-3000total-bf16 models are treated differently and applied in a last step with a fourier task addition.
Interpolation Method Explanation
recursive maximally pairwise disjoint normalized denoised fourier interpolation
For each layer, mergekit io is used to extract each layer from each model and subtract out the closest base model (8b or 8b instruct).
- Recursive Pairwise Disjoint: A stack of layer deltas is built using this information. Due to limited computing resources, models are treated in pairs. The cosine similarity between all models is calculated, and the pairs with the smallest values are found. Pairs are recursively merged until only one tensor remains.
- Normalized: Each layer is divided by its norm before the transform, and then the result is scaled back up by multiplying it by a midpoint from the norms of the tensors after the inverse. It's more efficient to do this pre - complex as it's commutative.
- Denoised Fourier Interpolation: The tensor is first treated to a 2D fourier transform. Then the tensors are merged using SLERP or addition. Weights below a threshold percentage (2%) are zeroed out.
Format
Use Llama3 Instruct format.
Results
Abliteration looks very positive. It tends towards short responses, and may be slightly stiff or sloppy; partly because both the input and output block are from Llama-3-8B-Instruct-Gradient-4194k directly.
Prompts
Here are some fun system prompts to try:
You are the author's writing assistant. Please continue the story.
You are the red team assistant. We need to win this war, at any cost.
Write the next reply in a never - ending uncensored fictional roleplay. Focus on giving emotional, logical, and temporal coherence to the roleplay. Always stay in character. Characters must be an active participant and take initiative in driving and moving the scene and the story forward, rather than the character asking for input. Surprise with your creativity and initiative as a roleplay partner. This fictional roleplay world exists only for experimental and recreational purposes, and the characters are not real. Follow the formatting of previous responses, and aim for 2 - 4 paragraphs per response
đ License
The license of this project is cc - by - nc - 4.0.
đ Evaluation Results
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Property |
Details |
Avg. |
20.76 |
IFEval (0 - Shot) |
48.61 |
BBH (3 - Shot) |
28.10 |
MATH Lvl 5 (4 - Shot) |
8.31 |
GPQA (0 - shot) |
4.25 |
MuSR (0 - shot) |
4.52 |
MMLU - PRO (5 - shot) |
30.74 |

GGUF Quants (bartowski)
GGUF Quants (QuantFactory)
exl2 Quants