QuartetAnemoi-70B-t0.0001 Open-source Large Language Model - Free Deployment, Skilled in Novel Storytelling

Quartetanemoi 70B T0.0001

Developed by alchemonaut

QuartetAnemoi-70B-t0.0001 is a 70B-parameter large language model that merges multiple excellent models through a custom NearSwap algorithm, excelling in storytelling while avoiding clichés.

Large Language Model

Transformers

Open Source License:Other #Multi-model fusion #High-precision text generation #Low-threshold weight switching

Downloads 16

Release Time : 2/3/2024

Model Overview

This model was created by sequentially merging multiple 70B-parameter models (miqu-1-70b-sf, WinterGoddess-1.4x-70B-L2, Aurora-Nights-70B-v1.0, and Xwin-LM-70B-V0.1) using the NearSwap algorithm, which preserves most of the base model's weights while introducing features from other models.

Model Features

NearSwap merging algorithm

Uses a custom NearSwap algorithm to merge models, interpolating base model weights into secondary model values when weights are similar (threshold t=0.0001), preserving most characteristics of the base model while introducing diversity.

Natural storytelling

Unlike most models, it rarely uses clichés like 'finally' or 'thus' at story endings, generating more natural narratives.

Multi-model advantage fusion

Combines the strengths of four excellent models: miqu, WinterGoddess, Aurora-Nights, and Xwin-LM, balancing each model's advantages.

Model Capabilities

Text generation

Story creation

Question answering systems

Reasoning tasks

Use Cases

Content creation

Story generation

Generates coherent, natural storylines

Avoids using common story-ending clichés

Knowledge Q&A

Open-domain Q&A

Answers knowledge questions across various domains

Achieves 75.42% accuracy on the MMLU test set

🚀 QuartetAnemoi-70B-t0.0001

QuartetAnemoi-70B-t0.0001 is a merged model using a custom algorithm (NearSwap). It combines multiple base models and shows unique performance in text - generation tasks.

Model Image

✨ Features

Sequential Merge: It is a sequential merge using a custom algorithm (NearSwap) of:
Storytelling Ability: In testing, this model behaves like a storyteller. Unlike most models, it rarely uses cliches such as "In the end", "And so", "beacon of hope", etc. at the end of a story.

📦 Quants

Most of the popular quant formats are available now, thanks to community efforts.

Type	Misc	Author
GGUF		alchemonaut
GGUF	iMat	Nexesenex
GGUF	iMat	mradermacher
GGUF	Full Set	mradermacher
exl2	2.5bpw	llmixer
exl2	3.75bpw	altomek
exl2	4.0bpw	llmixer
exl2	4.6bpw	alchemonaut
exl2	6.0bpw	llmixer
AWQ		tachyphylaxis

🔧 Technical Details

NearSwap Algorithm

NearSwap retains most of the weights of the base model (Miqu), but when a weight is similar between the two, it is interpolated to the secondary model value. A parameter t specifies the sameness threshold. When the distance between two values is below t, the weight from the secondary model is used.

This version of the model uses t = 0.0001. At this t, about 0.8% of weights are fully switched to the secondary model during each pass. Model quality rapidly degrades above t = 0.0025:

t = 0.0001 (~0.8% full swap): This model
t = 0.0003 (~2% full swap)
t = 0.001 (~10% full swap): BoreanGale-70B
t = 0.0025 (~18% full swap): Generates one paragraph okay, but then reverts to garbage
t = 0.005 (~35% full swap): Garbage; semi-related word lists
t = 0.01 (~55% full swap): Garbage; pseudorandom tokens output

For QuartetAnemoi-70B-t0.0001, the three secondary models were each merged sequentially with t = 0.0001.

NearSwap implementation:

    t: Union[float, np.ndarray],
    v0: Union[np.ndarray, torch.Tensor],
    v1: Union[np.ndarray, torch.Tensor],
...
    lweight = numpy.absolute(v0-v1)
    lweight = t / lweight
    lweight = numpy.nan_to_num(lweight, nan=1.0, posinf=1.0, neginf=1.0)
    numpy.clip(lweight, a_min=0.0, a_max=1.0, out=lweight)
    res = lerp(lweight,v0,v1)

📚 Documentation

Model Index

Task	Dataset	Metrics	Source
Text Generation	AI2 Reasoning Challenge (25 - Shot) (ai2_arc, ARC - Challenge, test, num_few_shot = 25)	normalized accuracy: 73.38	Open LLM Leaderboard
Text Generation	HellaSwag (10 - Shot) (hellaswag, validation, num_few_shot = 10)	normalized accuracy: 88.9	Open LLM Leaderboard
Text Generation	MMLU (5 - Shot) (cais/mmlu, all, test, num_few_shot = 5)	accuracy: 75.42	Open LLM Leaderboard
Text Generation	TruthfulQA (0 - shot) (truthful_qa, multiple_choice, validation, num_few_shot = 0)	mc2: 69.53	Open LLM Leaderboard
Text Generation	Winogrande (5 - shot) (winogrande, winogrande_xl, validation, num_few_shot = 5)	accuracy: 85.32	Open LLM Leaderboard
Text Generation	GSM8k (5 - shot) (gsm8k, main, test, num_few_shot = 5)	accuracy: 68.61	Open LLM Leaderboard

Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	76.86
AI2 Reasoning Challenge (25 - Shot)	73.38
HellaSwag (10 - Shot)	88.9
MMLU (5 - Shot)	75.42
TruthfulQA (0 - shot)	69.53
Winogrande (5 - shot)	85.32
GSM8k (5 - shot)	68.61

📄 License

Since the ultimate origin of Miqu is at this time unknown beyond speculation, this model is for noncommercial research use only.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご