đ Sonya-7B
Sonya-7B is currently the #1 model in the first turn of MT-Bench, outperforming GPT-4, and ranks #2 overall in MT-Bench. It serves as a versatile model suitable for various tasks, such as assistant services and role - playing.
đ Quick Start
Based on the parent models, this model is expected to be used with an 8192 context window. You can experimentally try a 16384 context by using NTK scaling alpha of 2.6.
⨠Features
Outstanding Performance
Sonya-7B significantly outperforms its parent models on MT - Bench. In the MT - Bench average turn, it scores 8.52, trailing only behind GPT - 4.
model |
score |
size |
gpt - 4 |
8.99 |
- |
Sonya - 7B |
8.52 |
7b |
xDAN - L1 - Chat - RL - v1 |
8.34 |
7b |
Starling - 7B |
8.09 |
7b |
Claude - 2 |
8.06 |
- |
Silicon - Maid |
7.96 |
7b |
Loyal - Macaroni - Maid |
7.95 |
7b |
gpt - 3.5 - turbo |
7.94 |
20b? |
Claude - 1 |
7.90 |
- |
OpenChat - 3.5 |
7.81 |
- |
vicuna - 33b - v1.3 |
7.12 |
33b |
wizardlm - 30b |
7.01 |
30b |
Llama - 2 - 70b - chat |
6.86 |
70b |
Model Composition
It's a merge of [xDAN - AI/xDAN - L1 - Chat - RL - v1](https://huggingface.co/xDAN - AI/xDAN - L1 - Chat - RL - v1), [Jan - Ai's Stealth v1.2](https://huggingface.co/jan - hq/stealth - v1.2), [chargoddard/piano - medley - 7b](https://huggingface.co/chargoddard/piano - medley - 7b), [NeverSleep/Noromaid - 7B - v0.2](https://huggingface.co/NeverSleep/Noromaid - 7b - v0.2), and [athirdpath/NSFW_DPO_vmgb - 7b](athirdpath/NSFW_DPO_vmgb - 7b).
Selection Rationale
- MT - Bench Correlation: MT - Bench is usually well - correlated with real - world model quality, and xDAN performs well on it.
- Prompt Consistency: Most models in the mix use Alpaca prompt formatting, ensuring prompt consistency.
- Magic Ingredient: Stealth v1.2 seems to boost MT - Bench scores.
- RP Enhancement: Adding RP models improves performance in Writing and Role - play benchmarks.
Other Benchmark Results
First turn
model |
turn |
score |
size |
Sonya - 7B |
1 |
9.06875 |
7b |
gpt - 4 |
1 |
8.95625 |
- |
xDAN - L1 - Chat - RL - v1 |
1 |
8.87500 |
7b |
xDAN - L2 - Chat - RL - v2 |
1 |
8.78750 |
30b |
claude - v1 |
1 |
8.15000 |
- |
gpt - 3.5 - turbo |
1 |
8.07500 |
20b |
vicuna - 33b - v1.3 |
1 |
7.45625 |
33b |
wizardlm - 30b |
1 |
7.13125 |
30b |
oasst - sft - 7 - llama - 30b |
1 |
7.10625 |
30b |
Llama - 2 - 70b - chat |
1 |
6.98750 |
70b |
Second turn
model |
turn |
score |
size |
gpt - 4 |
2 |
9.025000 |
- |
xDAN - L2 - Chat - RL - v2 |
2 |
8.087500 |
30b |
Sonya - 7B |
2 |
7.962500 |
7b |
xDAN - L1 - Chat - RL - v1 |
2 |
7.825000 |
7b |
gpt - 3.5 - turbo |
2 |
7.812500 |
20b |
claude - v1 |
2 |
7.650000 |
- |
wizardlm - 30b |
2 |
6.887500 |
30b |
vicuna - 33b - v1.3 |
2 |
6.787500 |
33b |
Llama - 2 - 70b - chat |
2 |
6.725000 |
70b |
đ Documentation
The Sauce
models:
- model: xDAN-AI/xDAN-L1-Chat-RL-v1
parameters:
weight: 1
density: 1
- model: chargoddard/piano-medley-7b
parameters:
weight: 0.3
- model: jan-hq/stealth-v1.2
parameters:
weight: 0.2
- model: NeverSleep/Noromaid-7b-v0.2
parameters:
weight: 0.2
- model: athirdpath/NSFW_DPO_vmgb-7b
parameters:
weight: 0.2
merge_method: ties
base_model: mistralai/Mistral-7B-v0.1
parameters:
density: 0.4
int8_mask: true
normalize: true
dtype: bfloat16
Note: There was no additional training, finetuning, or DPO. This is a straight merger.
Prompt Template (Alpaca)
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
It's found that this model performed worse with the xDAN prompt format. So, despite the heavy weight of xDAN in this merger, it's recommended against its use.
Replication of MT - Bench Run
If you want to replicate the MT - Bench run, make sure to apply the Alpaca prompt template to the model. You can do this by putting "alpaca" in the model path to trigger the AlpacaAdapter
.
đ License
This project is licensed under the CC - BY - 4.0 license.