14B-DPO-alpha Open-Source Language Model - Free Support for Chinese and English Text Generation, with Excellent Performance in MT-Bench Evaluation

14B DPO Alpha

Developed by CausalLM

CausalLM/14B-DPO-α is a large-scale causal language model supporting Chinese and English text generation tasks, with outstanding performance in MT-Bench evaluations.

Large Language Model

Transformers

Supports Multiple Languages#Multilingual Text Generation #Instruction Fine-Tuning Optimization #Leading Chinese Q&A

Downloads 172

Release Time : 11/2/2023

Model Overview

This model is a 14B-parameter causal language model trained using DPO (Direct Preference Optimization), focusing on high-quality text generation tasks.

Model Features

High-Performance Text Generation

Achieved a score of 7.618868 in MT-Bench, surpassing other models of similar scale

Multilingual Support

Supports Chinese and English text generation tasks

DPO Optimization

Trained using Direct Preference Optimization to enhance generation quality

Large-Scale Training Data

Trained on 20+ high-quality datasets including Guanaco, OpenOrca, Ultrachat, etc.

Model Capabilities

Text Generation

Dialogue Systems

Q&A Systems

Content Creation

Use Cases

Dialogue Systems

Intelligent Customer Service

Used to build multilingual intelligent customer service systems

Delivers smooth and accurate customer service interactions

Content Creation

Article Generation

Assists content creators in generating high-quality articles

Produces fluent and logically coherent content

Education

Learning Assistant

Serves as a study aid to answer student questions

Provides accurate knowledge-based answers

🚀 CausalLM Model

This project offers text - generation models trained on diverse datasets, with performance comparable to leading models in the field.

🚀 Quick Start

For details, please refer to the version without DPO training: CausalLM/14B.

📚 Documentation

Model Performance

Model	MT - Bench
GPT - 4	8.99
GPT - 3.5 - Turbo	7.94

Zephyr - 7b - β (Overfitting)	7.34
Zephyr - 7b - α	6.88

CausalLM/14B - DPO - α	7.618868
CausalLM/7B - DPO - α	7.038125

As of Dec 3, 2023, it ranks #1 among non - base models of its size on 🤗 Open LLM Leaderboard, outperforming ALL ~13B chat models.

image/png

Model Version Explanation

It should be noted that this is not a version that continues training on CausalLM/14B & 7B, but rather an optimized version that has undergone DPO training concurrently on a previous training branch, and some detailed parameters may have changed. You will still need to download the full model.

Future Release

The beta branch will soon be released, employing some aggressive approaches that might be detrimental in certain tasks, in order to achieve better alignment with human preferences, aiming to meet or exceed the GPT - 3.5 benchmarks. Stay tuned.

Dataset Information

Property	Details
Training Data	JosephusCheung/GuanacoDataset, Open - Orca/OpenOrca, stingning/ultrachat, meta - math/MetaMathQA, liuhaotian/LLaVA - Instruct - 150K, jondurbin/airoboros - 3.1, WizardLM/WizardLM_evol_instruct_V2_196k, RyokoAI/ShareGPT52K, RyokoAI/Fandom23K, milashkaarshif/MoeGirlPedia_wikitext_raw_archive, wikipedia, wiki_lingua, fnlp/moss - 003 - sft - data, garage - bAInd/Open - Platypus, LDJnr/Puffin, openbmb/llava_zh, BAAI/COIG, TigerResearch/tigerbot - zhihu - zh - 10k, liwu/MNBVC, teknium/openhermes, openbmb/UltraFeedback, lmsys/lmsys - chat - 1m

Property

Details

Training Data

JosephusCheung/GuanacoDataset, Open - Orca/OpenOrca, stingning/ultrachat, meta - math/MetaMathQA, liuhaotian/LLaVA - Instruct - 150K, jondurbin/airoboros - 3.1, WizardLM/WizardLM_evol_instruct_V2_196k, RyokoAI/ShareGPT52K, RyokoAI/Fandom23K, milashkaarshif/MoeGirlPedia_wikitext_raw_archive, wikipedia, wiki_lingua, fnlp/moss - 003 - sft - data, garage - bAInd/Open - Platypus, LDJnr/Puffin, openbmb/llava_zh, BAAI/COIG, TigerResearch/tigerbot - zhihu - zh - 10k, liwu/MNBVC, teknium/openhermes, openbmb/UltraFeedback, lmsys/lmsys - chat - 1m

Disclaimer

⚠️ Important Note

Please note that the model was trained on unfiltered internet data. Since we do not have the capacity to vet all of it, there may be a substantial amount of objectionable content, pornography, violence, and offensive language present that we are unable to remove. Therefore, you will still need to complete your own checks on the model's safety and filter keywords in the output. Due to computational resource constraints, we are presently unable to implement RLHF for the model's ethics and safety, nor training on SFT samples that refuse to answer certain questions for restrictive fine - tuning.

📄 License

The model is released under the WTFPL license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご