🚀 QuartetAnemoi-70B-t0.0001
QuartetAnemoi-70B-t0.0001 is a merged model using a custom algorithm (NearSwap). It combines multiple base models and shows unique performance in text - generation tasks.

✨ Features
- Sequential Merge: It is a sequential merge using a custom algorithm (NearSwap) of:
- Storytelling Ability: In testing, this model behaves like a storyteller. Unlike most models, it rarely uses cliches such as "In the end", "And so", "beacon of hope", etc. at the end of a story.
📦 Quants
Most of the popular quant formats are available now, thanks to community efforts.
Type |
Misc |
Author |
GGUF |
|
alchemonaut |
GGUF |
iMat |
Nexesenex |
GGUF |
iMat |
mradermacher |
GGUF |
Full Set |
mradermacher |
exl2 |
2.5bpw |
llmixer |
exl2 |
3.75bpw |
altomek |
exl2 |
4.0bpw |
llmixer |
exl2 |
4.6bpw |
alchemonaut |
exl2 |
6.0bpw |
llmixer |
AWQ |
|
tachyphylaxis |
🔧 Technical Details
NearSwap Algorithm
NearSwap retains most of the weights of the base model (Miqu), but when a weight is similar between the two, it is interpolated to the secondary model value. A parameter t specifies the sameness threshold. When the distance between two values is below t, the weight from the secondary model is used.
This version of the model uses t = 0.0001. At this t, about 0.8% of weights are fully switched to the secondary model during each pass. Model quality rapidly degrades above t = 0.0025:
- t = 0.0001 (~0.8% full swap): This model
- t = 0.0003 (~2% full swap)
- t = 0.001 (~10% full swap): BoreanGale-70B
- t = 0.0025 (~18% full swap): Generates one paragraph okay, but then reverts to garbage
- t = 0.005 (~35% full swap): Garbage; semi-related word lists
- t = 0.01 (~55% full swap): Garbage; pseudorandom tokens output
For QuartetAnemoi-70B-t0.0001, the three secondary models were each merged sequentially with t = 0.0001.
NearSwap implementation:
t: Union[float, np.ndarray],
v0: Union[np.ndarray, torch.Tensor],
v1: Union[np.ndarray, torch.Tensor],
...
lweight = numpy.absolute(v0-v1)
lweight = t / lweight
lweight = numpy.nan_to_num(lweight, nan=1.0, posinf=1.0, neginf=1.0)
numpy.clip(lweight, a_min=0.0, a_max=1.0, out=lweight)
res = lerp(lweight,v0,v1)
📚 Documentation
Model Index
Task |
Dataset |
Metrics |
Source |
Text Generation |
AI2 Reasoning Challenge (25 - Shot) (ai2_arc, ARC - Challenge, test, num_few_shot = 25) |
normalized accuracy: 73.38 |
Open LLM Leaderboard |
Text Generation |
HellaSwag (10 - Shot) (hellaswag, validation, num_few_shot = 10) |
normalized accuracy: 88.9 |
Open LLM Leaderboard |
Text Generation |
MMLU (5 - Shot) (cais/mmlu, all, test, num_few_shot = 5) |
accuracy: 75.42 |
Open LLM Leaderboard |
Text Generation |
TruthfulQA (0 - shot) (truthful_qa, multiple_choice, validation, num_few_shot = 0) |
mc2: 69.53 |
Open LLM Leaderboard |
Text Generation |
Winogrande (5 - shot) (winogrande, winogrande_xl, validation, num_few_shot = 5) |
accuracy: 85.32 |
Open LLM Leaderboard |
Text Generation |
GSM8k (5 - shot) (gsm8k, main, test, num_few_shot = 5) |
accuracy: 68.61 |
Open LLM Leaderboard |
Evaluation Results
Detailed results can be found here
Metric |
Value |
Avg. |
76.86 |
AI2 Reasoning Challenge (25 - Shot) |
73.38 |
HellaSwag (10 - Shot) |
88.9 |
MMLU (5 - Shot) |
75.42 |
TruthfulQA (0 - shot) |
69.53 |
Winogrande (5 - shot) |
85.32 |
GSM8k (5 - shot) |
68.61 |
📄 License
Since the ultimate origin of Miqu is at this time unknown beyond speculation, this model is for noncommercial research use only.