7B-DPO-alpha Open-Source Language Model - Free Support for Chinese and English Text Generation Tasks

7B DPO Alpha

Developed by CausalLM

A 7B-parameter causal language model trained on multi-source datasets, optimized with DPO, supporting Chinese and English text generation tasks

Large Language Model

Transformers

Supports Multiple Languages#Bilingual generation (Chinese-English)#Human preference optimization #Multi-source dataset

Downloads 131

Release Time : 11/2/2023

Model Overview

This model is a Direct Preference Optimization (DPO)-enhanced causal language model focused on text generation tasks. Based on the Llama architecture, it incorporates multiple high-quality datasets for training and outperforms comparable 7B models on the MT-Bench benchmark.

Model Features

Multi-source data integration

Incorporates 20+ high-quality datasets including Guanaco, OpenOrca, UltraChat, etc., covering diverse domains

DPO optimization

Trained with Direct Preference Optimization method, better aligned with human preferences compared to base versions

Bilingual support

Supports both English and Chinese text generation with excellent performance on Chinese tasks

Performance optimization

Achieves MT-Bench score of 7.038, surpassing average performance of comparable 7B models

Model Capabilities

Text generation

Dialogue systems

Question answering

Content creation

Use Cases

Dialogue systems

Intelligent customer service

Used for building multi-turn dialogue customer service systems

Content creation

Article generation

Generates coherent text content based on prompts

Educational assistance

Learning assistant

Answers study questions and provides knowledge explanations

🚀 CausalLM Model Introduction

This project presents a text - generation model trained on diverse datasets. It offers optimized versions with DPO training, aiming to meet or exceed certain benchmarks.

📚 Documentation

Datasets

The model is trained on the following datasets:

JosephusCheung/GuanacoDataset
Open - Orca/OpenOrca
stingning/ultrachat
meta - math/MetaMathQA
liuhaotian/LLaVA - Instruct - 150K
jondurbin/airoboros - 3.1
WizardLM/WizardLM_evol_instruct_V2_196k
RyokoAI/ShareGPT52K
RyokoAI/Fandom23K
milashkaarshif/MoeGirlPedia_wikitext_raw_archive
wikipedia
wiki_lingua
fnlp/moss - 003 - sft - data
garage - bAInd/Open - Platypus
LDJnr/Puffin
openbmb/llava_zh
BAAI/COIG
TigerResearch/tigerbot - zhihu - zh - 10k
liwu/MNBVC
teknium/openhermes
openbmb/UltraFeedback
lmsys/lmsys - chat - 1m

Model Performance Comparison

Model	MT - Bench
GPT - 4	8.99
GPT - 3.5 - Turbo	7.94

Zephyr - 7b - β (Overfitting)	7.34
Zephyr - 7b - α	6.88

CausalLM/14B - DPO - α	7.618868
CausalLM/7B - DPO - α	7.038125

Version Details

For details, please refer to the version without DPO training: CausalLM/7B and CausalLM/14B.

It should be noted that this is not a version that continues training on CausalLM/14B & 7B, but rather an optimized version that has undergone DPO training concurrently on a previous training branch, and some detailed parameters may have changed. You will still need to download the full model.

Future Release

The beta branch will soon be released, employing some aggressive approaches that might be detrimental in certain tasks, in order to achieve better alignment with human preferences, aiming to meet or exceed the GPT - 3.5 benchmarks. Stay tuned.

Disclaimer

Please note that the model was trained on unfiltered internet data. Since we do not have the capacity to vet all of it, there may be a substantial amount of objectionable content, pornography, violence, and offensive language present that we are unable to remove. Therefore, you will still need to complete your own checks on the model's safety and filter keywords in the output. Due to computational resource constraints, we are presently unable to implement RLHF for the model's ethics and safety, nor training on SFT samples that refuse to answer certain questions for restrictive fine - tuning.

📄 License

The model is released under the WTFPL license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご