Kunoichi-DPO-7B Open-source AI Model - Suitable for general scenarios with strong inference and instruction-following capabilities

Kunoichi DPO 7B

Developed by SanjiWatsuki

Kunoichi-DPO-7B is a model fine-tuned using Intel's Orca data for direct preference optimization (DPO) on the Alpaca template based on the Kunoichi-7B model. It is mainly targeted at general scenarios and has stronger reasoning and instruction-following abilities.

Large Language Model

Transformers

#DPO Optimization Inference #8k Long Context Support #General Instruction Enhancement

Downloads 748

Release Time : 1/11/2024

Model Overview

This model improves reasoning and instruction-following abilities through DPO fine-tuning and is suitable for general scenarios. However, the role-playing ability may be affected due to dataset alignment. It supports a maximum 8k context window and experimentally supports a 16k context window.

Model Features

Enhanced Reasoning Ability

Through DPO fine-tuning, the model performs better in reasoning and instruction following.

Large Context Window Support

Supports a maximum 8k context window and experimentally supports a 16k context window.

Suitable for General Scenarios

Suitable for various general scenarios, but the role-playing ability may decline.

Model Capabilities

Text Generation

Instruction Following

Reasoning Tasks

Use Cases

General Text Generation

Instruction Response

Generate appropriate response texts according to user instructions.

Scored 8.29 in the MT Bench test, outperforming multiple similar models.

Educational Assistance

Logic Test

Used for logical reasoning and problem-solving.

Scored 0.59 in the Logic Test, performing better than some similar models.

🚀 Kunoichi-DPO-7B

This repository hosts Kunoichi-DPO-7B, a DPO finetuned model. It uses Intel's Orca pairs with the Alpaca template on Kunoichi-7B, targeting general use with enhanced reasoning and instruction - following capabilities.

🚀 Quick Start

To get started with this model, you can refer to the information below for details about the model, its performance, and usage.

✨ Features

Enhanced Capabilities: In testing, Kunoichi - DPO - 7B has stronger reasoning and instruction following capabilities compared to Kunoichi - 7B.
Context Window: The model is intended to be used with up to an 8k context window. Experimentally, with a NTK RoPE alpha of 2.6, it can be used up to a 16k context window.

📚 Documentation

Description

This repository hosts Kunoichi - DPO - 7B, a DPO finetune using Intel's Orca pairs with the Alpaca template on Kunoichi - 7B. This model is targeted at general use. In my testing, it has stronger reasoning and instruction following capabilities than Kunoichi - 7B but it may be worse for role - playing purposes due to the alignment from the Orca dataset.

This model is undergoing benchmark testing and I will update the model page with the finalized results.

Model	MT Bench	EQ Bench	MMLU	Logic Test
GPT - 4 - Turbo	9.32	-	-	-
GPT - 4	8.99	62.52	86.4	0.86
Kunoichi - DPO - 7B	8.29	41.60	-	0.59
Kunoichi - 7B	8.14	44.32	64.9	0.58
Starling - 7B	8.09	-	63.9	0.51
Claude - 2	8.06	52.14	78.5	-
Silicon - Maid - 7B	7.96	40.44	64.7	0.54
Loyal - Macaroni - Maid - 7B	7.95	38.66	64.9	0.57
GPT - 3.5 - Turbo	7.94	50.28	70	0.57
Claude - 1	7.9	-	77	-
Openchat - 3.5	7.81	37.08	64.3	0.39
Dolphin - 2.6 - DPO	7.74	42.88	61.9	0.53
Zephyr - 7B - beta	7.34	38.71	61.4	0.30
Llama - 2 - 70b - chat - hf	6.86	51.56	63	-
Neural - chat - 7b - v3 - 1	6.84	43.61	62.4	0.30

Model	Average	AGIEval	GPT4All	TruthfulQA	Bigbench
Kunoichi - DPO - 7B	58.4	45.08	74	66.99	47.52
[Kunoichi - 7B](https://huggingface.co/SanjiWatsuki/Kunoichi - 7B)	57.54	44.99	74.86	63.72	46.58
[OpenPipe/mistral - ft - optimized - 1218](https://huggingface.co/OpenPipe/mistral - ft - optimized - 1218)	56.85	44.74	75.6	59.89	47.17
[Silicon - Maid - 7B](https://huggingface.co/SanjiWatsuki/Silicon - Maid - 7B)	56.45	44.74	74.26	61.5	45.32
[mlabonne/NeuralHermes - 2.5 - Mistral - 7B](https://huggingface.co/mlabonne/NeuralHermes - 2.5 - Mistral - 7B)	53.51	43.67	73.24	55.37	41.76
[teknium/OpenHermes - 2.5 - Mistral - 7B](https://huggingface.co/teknium/OpenHermes - 2.5 - Mistral - 7B)	52.42	42.75	72.99	52.99	40.94
openchat/openchat_3.5	51.34	42.67	72.92	47.27	42.51
[berkeley - nest/Starling - LM - 7B - alpha](https://huggingface.co/berkeley - nest/Starling - LM - 7B - alpha)	51.16	42.06	72.72	47.33	42.53
[HuggingFaceH4/zephyr - 7b - beta](https://huggingface.co/HuggingFaceH4/zephyr - 7b - beta)	50.99	37.33	71.83	55.1	39.7

The model is intended to be used with up to an 8k context window. Using a NTK RoPE alpha of 2.6, the model can be used experimentally up to a 16k context window.

Prompt template: Custom format, or Alpaca

Alpaca:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:

SillyTavern format:

I found the best SillyTavern results from using the Noromaid template.

SillyTavern config files: Context, Instruct.

Additionally, here is my highly recommended [Text Completion preset](https://huggingface.co/SanjiWatsuki/Loyal - Macaroni - Maid - 7B/blob/main/Characters/MinP.json). You can tweak this by adjusting temperature up or dropping min p to boost creativity or raise min p to increase stability. You shouldn't need to touch anything else!

📄 License

This model is licensed under CC - BY - NC - 4.0.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご