đ Kunoichi-DPO-7B
This repository hosts Kunoichi-DPO-7B, a DPO finetuned model. It uses Intel's Orca pairs with the Alpaca template on Kunoichi-7B, targeting general use with enhanced reasoning and instruction - following capabilities.
đ Quick Start
To get started with this model, you can refer to the information below for details about the model, its performance, and usage.
⨠Features
- Enhanced Capabilities: In testing, Kunoichi - DPO - 7B has stronger reasoning and instruction following capabilities compared to Kunoichi - 7B.
- Context Window: The model is intended to be used with up to an 8k context window. Experimentally, with a NTK RoPE alpha of 2.6, it can be used up to a 16k context window.
đ Documentation
Description
This repository hosts Kunoichi - DPO - 7B, a DPO finetune using Intel's Orca pairs with the Alpaca template on Kunoichi - 7B. This model is targeted at general use. In my testing, it has stronger reasoning and instruction following capabilities than Kunoichi - 7B but it may be worse for role - playing purposes due to the alignment from the Orca dataset.
This model is undergoing benchmark testing and I will update the model page with the finalized results.
Model |
MT Bench |
EQ Bench |
MMLU |
Logic Test |
GPT - 4 - Turbo |
9.32 |
- |
- |
- |
GPT - 4 |
8.99 |
62.52 |
86.4 |
0.86 |
Kunoichi - DPO - 7B |
8.29 |
41.60 |
- |
0.59 |
Kunoichi - 7B |
8.14 |
44.32 |
64.9 |
0.58 |
Starling - 7B |
8.09 |
- |
63.9 |
0.51 |
Claude - 2 |
8.06 |
52.14 |
78.5 |
- |
Silicon - Maid - 7B |
7.96 |
40.44 |
64.7 |
0.54 |
Loyal - Macaroni - Maid - 7B |
7.95 |
38.66 |
64.9 |
0.57 |
GPT - 3.5 - Turbo |
7.94 |
50.28 |
70 |
0.57 |
Claude - 1 |
7.9 |
- |
77 |
- |
Openchat - 3.5 |
7.81 |
37.08 |
64.3 |
0.39 |
Dolphin - 2.6 - DPO |
7.74 |
42.88 |
61.9 |
0.53 |
Zephyr - 7B - beta |
7.34 |
38.71 |
61.4 |
0.30 |
Llama - 2 - 70b - chat - hf |
6.86 |
51.56 |
63 |
- |
Neural - chat - 7b - v3 - 1 |
6.84 |
43.61 |
62.4 |
0.30 |
Model |
Average |
AGIEval |
GPT4All |
TruthfulQA |
Bigbench |
Kunoichi - DPO - 7B |
58.4 |
45.08 |
74 |
66.99 |
47.52 |
[Kunoichi - 7B](https://huggingface.co/SanjiWatsuki/Kunoichi - 7B) |
57.54 |
44.99 |
74.86 |
63.72 |
46.58 |
[OpenPipe/mistral - ft - optimized - 1218](https://huggingface.co/OpenPipe/mistral - ft - optimized - 1218) |
56.85 |
44.74 |
75.6 |
59.89 |
47.17 |
[Silicon - Maid - 7B](https://huggingface.co/SanjiWatsuki/Silicon - Maid - 7B) |
56.45 |
44.74 |
74.26 |
61.5 |
45.32 |
[mlabonne/NeuralHermes - 2.5 - Mistral - 7B](https://huggingface.co/mlabonne/NeuralHermes - 2.5 - Mistral - 7B) |
53.51 |
43.67 |
73.24 |
55.37 |
41.76 |
[teknium/OpenHermes - 2.5 - Mistral - 7B](https://huggingface.co/teknium/OpenHermes - 2.5 - Mistral - 7B) |
52.42 |
42.75 |
72.99 |
52.99 |
40.94 |
openchat/openchat_3.5 |
51.34 |
42.67 |
72.92 |
47.27 |
42.51 |
[berkeley - nest/Starling - LM - 7B - alpha](https://huggingface.co/berkeley - nest/Starling - LM - 7B - alpha) |
51.16 |
42.06 |
72.72 |
47.33 |
42.53 |
[HuggingFaceH4/zephyr - 7b - beta](https://huggingface.co/HuggingFaceH4/zephyr - 7b - beta) |
50.99 |
37.33 |
71.83 |
55.1 |
39.7 |
The model is intended to be used with up to an 8k context window. Using a NTK RoPE alpha of 2.6, the model can be used experimentally up to a 16k context window.
Prompt template: Custom format, or Alpaca
Alpaca:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
SillyTavern format:
I found the best SillyTavern results from using the Noromaid template.
SillyTavern config files: Context, Instruct.
Additionally, here is my highly recommended [Text Completion preset](https://huggingface.co/SanjiWatsuki/Loyal - Macaroni - Maid - 7B/blob/main/Characters/MinP.json). You can tweak this by adjusting temperature up or dropping min p to boost creativity or raise min p to increase stability. You shouldn't need to touch anything else!
đ License
This model is licensed under CC - BY - NC - 4.0.