๐ BoreanGale-70B
BoreanGale-70B is a merged model using a custom algorithm (NearSwap), which combines the strengths of 152334H/miqu-1-70b-sf and Sao10K/WinterGoddess-1.4x-70B-L2. It shows promising performance in various text - generation tasks.

๐ Quick Start
This section provides an overview of the BoreanGale - 70B model, including its composition, available quantizations, algorithm details, license, and evaluation results.
โจ Features
- Custom Merge Algorithm: Utilizes the NearSwap algorithm to combine two base models effectively.
- Multiple Quantizations: Thanks to community efforts, several quantization types are available.
- Good Performance: Demonstrates good performance in multiple text - generation tasks on the Open LLM Leaderboard.
๐ฆ Installation
No installation steps are provided in the original document, so this section is skipped.
๐ป Usage Examples
No code examples are provided in the original document, so this section is skipped.
๐ Documentation
Model Composition
BoreanGale - 70B is a merge using a custom algorithm (NearSwap) of:
Available Quants
Several quants are available thanks to community efforts:
Type |
Misc |
Author |
GGUF |
iMat Q3 |
Nexesenex |
GGUF |
iMat |
mradermacher |
GGUF |
Full Set |
mradermacher |
GGUF |
Misc |
LoneStriker |
exl2 |
2.4 bpw |
LoneStriker |
exl2 |
3.5 bpw |
LoneStriker |
exl2 |
4.0 bpw |
LoneStriker |
exl2 |
4.65 bpw |
LoneStriker |
NearSwap Algorithm
NearSwap retains most of the weights of the base model (Miqu), but when a weight is similar between the two, it is interpolated to the secondary model (WinterGoddess) value. A parameter t specifies the sameness threshold. When the distance between two values is below t, the weight from the secondary model (WinterGoddess) is used.
This version of the model uses t = 0.001. At this t, about 10% of weights are fully switched to WinterGoddess. Model quality rapidly degrades above t = 0.0025:
- t = 0.0001 (~0.8% full swap): QuartetAnemoi-70B-t0.0001
- t = 0.0003 (~2% full swap)
- t = 0.001 (~10% full swap): This model
- t = 0.0025 (~18% full swap): Generates one paragraph okay, but then reverts to garbage
- t = 0.005 (~35% full swap): Garbage; semi - related word lists
- t = 0.01 (~55% full swap): Garbage; pseudorandom tokens output
NearSwap implementation:
t: Union[float, np.ndarray],
v0: Union[np.ndarray, torch.Tensor],
v1: Union[np.ndarray, torch.Tensor],
...
lweight = numpy.absolute(v0-v1)
lweight = t / lweight
lweight = numpy.nan_to_num(lweight, nan=1.0, posinf=1.0, neginf=1.0)
numpy.clip(lweight, a_min=0.0, a_max=1.0, out=lweight)
res = lerp(lweight,v0,v1)
License and Use
Since the ultimate origin of Miqu is at this time unknown beyond speculation, this model is for noncommercial research use only.
Detailed results can be found here
Metric |
Value |
Avg. |
76.48 |
AI2 Reasoning Challenge (25 - Shot) |
73.89 |
HellaSwag (10 - Shot) |
89.37 |
MMLU (5 - Shot) |
75.19 |
TruthfulQA (0 - shot) |
68.6 |
Winogrande (5 - shot) |
84.53 |
GSM8k (5 - shot) |
67.32 |
๐ง Technical Details
The NearSwap algorithm is a key technical aspect of this model. It carefully controls the weight interpolation between two models based on the similarity of weights. By adjusting the parameter t, the proportion of weights switched from the base model to the secondary model can be controlled. However, the model quality is highly sensitive to the value of t, and performance degrades rapidly when t exceeds 0.0025.
๐ License
This model is for noncommercial research use only due to the uncertain origin of Miqu.