đ Zephyr 141B-A39B
Zephyr is a series of language models designed to serve as helpful assistants. The Zephyr 141B-A39B is the latest in this series. It's a fine - tuned version of mistral - community/Mixtral - 8x22B - v0.1, trained with a novel alignment algorithm, Odds Ratio Preference Optimization (ORPO), using 7k instances for 1.3 hours on 4 nodes of 8 x H100s. ORPO is computationally more efficient than methods like DPO and PPO as it doesn't require an SFT step to achieve high performance. The training used the argilla/distilabel - capybara - dpo - 7k - binarized
preference dataset, which contains synthetic, high - quality, multi - turn preferences scored via LLMs.
â ī¸ Important Note
This model was trained collaboratively between Argilla, KAIST, and Hugging Face
đ Quick Start
The model was fine - tuned on a blend of chat, code, math, and reasoning data. Here's how you can run the model using the pipeline()
function from đ¤ Transformers:
Basic Usage
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1",
device_map="auto",
torch_dtype=torch.bfloat16,
)
messages = [
{
"role": "system",
"content": "You are Zephyr, a helpful assistant.",
},
{"role": "user", "content": "Explain how Mixture of Experts work in language a child would understand."},
]
outputs = pipe(
messages,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.95,
)
print(outputs[0]["generated_text"][-1]["content"])
⨠Features
- Efficient Training Algorithm: Trained with ORPO, which is computationally more efficient than DPO and PPO.
- High - Quality Dataset: Utilized a preference dataset with synthetic, high - quality, multi - turn preferences.
- Strong Performance: Achieves strong performance on chat benchmarks like MT Bench and IFEval.
đĻ Installation
pip install 'transformers>=4.39.3'
pip install accelerate
đ Documentation
Model Details
Model Description
Property |
Details |
Model Type |
A Mixture of Experts (MoE) model with 141B total parameters and 39B active parameters. Fine - tuned on a mix of publicly available, synthetic datasets. |
Language(s) (NLP) |
Primarily English |
License |
Apache 2.0 |
Finetuned from model |
mistral - community/Mixtral - 8x22B - v0.1 |
Model Sources
- Repository: https://github.com/huggingface/alignment - handbook
- Dataset: https://huggingface.co/datasets/argilla/distilabel - capybara - dpo - 7k - binarized
Performance
Zephyr 141B - A39B was trained to test the effectiveness of ORPO at scale. The underlying dataset contains a mix of general chat capabilities. It performs well on chat benchmarks like [MT Bench](https://huggingface.co/spaces/lmsys/mt - bench) and IFEval. The scores below were obtained using the LightEval evaluation suite, with each prompt formatted according to the model's chat template for real - world simulation. This may cause score differences from technical reports or the Open LLM Leaderboard.
Intended uses & limitations
The model is suitable for general chat, code, math, and reasoning tasks. However, it has not been aligned to human preferences for safety in the RLHF phase and lacks in - the - loop response filtering like ChatGPT. So, it may produce problematic outputs, especially when prompted to do so.
Bias, Risks, and Limitations
Zephyr 141B - A39B has not been aligned to human safety preferences during the RLHF phase and doesn't have in - the - loop response filtering. Thus, it can generate problematic outputs. Also, the size and composition of the corpus used to train the base model (mistral - community/Mixtral - 8x22B - v0.1
) are unknown, though it likely includes web data and technical sources like books and code.
Training procedure
Training hyperparameters
- learning_rate: 5e - 06
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- distributed_type: multi - GPU
- num_devices: 32
- total_train_batch_size: 32
- total_eval_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: inverse_sqrt
- lr_scheduler_warmup_steps: 100
- num_epochs: 3
Framework versions
- Transformers 4.39.3
- Pytorch 2.1.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.1
đ§ Technical Details
The model uses the ORPO algorithm for training, which is a novel alignment algorithm that doesn't require an SFT step. This makes the training more computationally efficient compared to traditional methods like DPO and PPO.
đ License
This model is licensed under the Apache 2.0 license.
đ Citation
If you find Zephyr 141B - A39B useful in your work, please cite the ORPO paper:
@misc{hong2024orpo,
title={ORPO: Monolithic Preference Optimization without Reference Model},
author={Jiwoo Hong and Noah Lee and James Thorne},
year={2024},
eprint={2403.07691},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
You may also wish to cite the creators of this model:
@misc{zephyr_141b,
author = {Alvaro Bartolome and Jiwoo Hong and Noah Lee and Kashif Rasul and Lewis Tunstall},
title = {Zephyr 141B A39B},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face repository},
howpublished = {\url{https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1}}
}