Zephyr-orpo-141b-A35b-v0.1 Open-source Large Language Model - Free to Serve as Your Intimate and Useful Assistant

Zephyr Orpo 141b A35b V0.1

Developed by HuggingFaceH4

Zephyr 141B-A39B is a large language model fine-tuned from Mixtral-8x22B-v0.1, trained using the ORPO alignment algorithm, designed to be a helpful assistant.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Mixture of Experts #ORPO Optimization #Multi-turn Dialogue

Downloads 3,382

Release Time : 4/10/2024

Model Overview

Zephyr 141B-A39B is a Mixture of Experts (MoE) model with a total of 141B parameters and 39B active parameters. It was fine-tuned on a mix of chat, code, math, and reasoning data, supporting primarily English interactions.

Model Features

ORPO Alignment Algorithm

Trained using the Odds Ratio Preference Optimization (ORPO) algorithm, which is more computationally efficient than methods like DPO and PPO.

Efficient Training

Completed training in just 1.3 hours on 4 nodes (each with 8 H100 GPUs) using only 7k instances.

Multi-turn Dialogue Capability

Trained on high-quality, multi-turn synthetic preference datasets, excelling in conversational interactions.

Model Capabilities

Text Generation

Multi-turn Dialogue

Code Generation

Mathematical Reasoning

Use Cases

Conversational Assistant

Smart Customer Service

Used for providing customer support and answering common questions

Capable of understanding complex questions and providing accurate answers

Educational Assistance

Concept Explanation

Explains complex concepts in simple language

Can translate professional terminology into language easily understood by children

🚀 Zephyr 141B-A39B

Zephyr is a series of language models designed to serve as helpful assistants. The Zephyr 141B-A39B is the latest in this series. It's a fine - tuned version of mistral - community/Mixtral - 8x22B - v0.1, trained with a novel alignment algorithm, Odds Ratio Preference Optimization (ORPO), using 7k instances for 1.3 hours on 4 nodes of 8 x H100s. ORPO is computationally more efficient than methods like DPO and PPO as it doesn't require an SFT step to achieve high performance. The training used the argilla/distilabel - capybara - dpo - 7k - binarized preference dataset, which contains synthetic, high - quality, multi - turn preferences scored via LLMs.

⚠️ Important Note

This model was trained collaboratively between Argilla, KAIST, and Hugging Face

🚀 Quick Start

The model was fine - tuned on a blend of chat, code, math, and reasoning data. Here's how you can run the model using the pipeline() function from 🤗 Transformers:

Basic Usage

# pip install 'transformers>=4.39.3'
# pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
messages = [
    {
        "role": "system",
        "content": "You are Zephyr, a helpful assistant.",
    },
    {"role": "user", "content": "Explain how Mixture of Experts work in language a child would understand."},
]
outputs = pipe(
    messages,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
)
print(outputs[0]["generated_text"][-1]["content"])

✨ Features

Efficient Training Algorithm: Trained with ORPO, which is computationally more efficient than DPO and PPO.
High - Quality Dataset: Utilized a preference dataset with synthetic, high - quality, multi - turn preferences.
Strong Performance: Achieves strong performance on chat benchmarks like MT Bench and IFEval.

📦 Installation

pip install 'transformers>=4.39.3'
pip install accelerate

📚 Documentation

Model Details

Model Description

Property	Details
Model Type	A Mixture of Experts (MoE) model with 141B total parameters and 39B active parameters. Fine - tuned on a mix of publicly available, synthetic datasets.
Language(s) (NLP)	Primarily English
License	Apache 2.0
Finetuned from model	mistral - community/Mixtral - 8x22B - v0.1

Model Sources

Repository: https://github.com/huggingface/alignment - handbook
Dataset: https://huggingface.co/datasets/argilla/distilabel - capybara - dpo - 7k - binarized

Performance

Zephyr 141B - A39B was trained to test the effectiveness of ORPO at scale. The underlying dataset contains a mix of general chat capabilities. It performs well on chat benchmarks like [MT Bench](https://huggingface.co/spaces/lmsys/mt - bench) and IFEval. The scores below were obtained using the LightEval evaluation suite, with each prompt formatted according to the model's chat template for real - world simulation. This may cause score differences from technical reports or the Open LLM Leaderboard.

Model	MT Bench	IFEval	BBH	AGIEval
zephyr - orpo - 141b - A39b - v0.1	8.17	65.06	58.96	44.16
[databricks/dbrx - instruct](https://huggingface.co/databricks/dbrx - instruct)	8.26	52.13	48.50	41.16
mistralai/Mixtral - 8x7B - Instruct - v0.1	8.30	55.08	45.31	47.68

Intended uses & limitations

The model is suitable for general chat, code, math, and reasoning tasks. However, it has not been aligned to human preferences for safety in the RLHF phase and lacks in - the - loop response filtering like ChatGPT. So, it may produce problematic outputs, especially when prompted to do so.

Bias, Risks, and Limitations

Zephyr 141B - A39B has not been aligned to human safety preferences during the RLHF phase and doesn't have in - the - loop response filtering. Thus, it can generate problematic outputs. Also, the size and composition of the corpus used to train the base model (mistral - community/Mixtral - 8x22B - v0.1) are unknown, though it likely includes web data and technical sources like books and code.

Training procedure

Training hyperparameters

learning_rate: 5e - 06
train_batch_size: 1
eval_batch_size: 8
seed: 42
distributed_type: multi - GPU
num_devices: 32
total_train_batch_size: 32
total_eval_batch_size: 256
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: inverse_sqrt
lr_scheduler_warmup_steps: 100
num_epochs: 3

Framework versions

Transformers 4.39.3
Pytorch 2.1.2+cu121
Datasets 2.18.0
Tokenizers 0.15.1

🔧 Technical Details

The model uses the ORPO algorithm for training, which is a novel alignment algorithm that doesn't require an SFT step. This makes the training more computationally efficient compared to traditional methods like DPO and PPO.

📄 License

This model is licensed under the Apache 2.0 license.

📚 Citation

If you find Zephyr 141B - A39B useful in your work, please cite the ORPO paper:

@misc{hong2024orpo,
      title={ORPO: Monolithic Preference Optimization without Reference Model}, 
      author={Jiwoo Hong and Noah Lee and James Thorne},
      year={2024},
      eprint={2403.07691},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

You may also wish to cite the creators of this model:

@misc{zephyr_141b,
  author = {Alvaro Bartolome and Jiwoo Hong and Noah Lee and Kashif Rasul and Lewis Tunstall},
  title = {Zephyr 141B A39B},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\url{https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご