Mistral-orpo-beta Open-source Language Model - Learn Preferences Directly Without Preheating, Free and Extremely Convenient to Use

Mistral Orpo Beta

Developed by kaist-ai

Mistral-ORPO-β is a 7B-parameter language model fine-tuned using the ORPO method based on Mistral-7B, capable of directly learning preferences without a supervised fine-tuning warm-up phase.

Large Language Model

Transformers

EnglishOpen Source License:MIT #Unsupervised Preference Optimization #Efficient Dialogue Generation #Outstanding Multi-Task Evaluation

Downloads 18

Release Time : 3/12/2024

Model Overview

This is a 7B-parameter language model optimized via the ORPO method, focusing on text generation tasks and demonstrating outstanding performance across multiple benchmarks.

Model Features

ORPO Optimization

Uses Odds Ratio Preference Optimization method to directly learn preferences without a supervised fine-tuning warm-up phase.

Efficient Fine-Tuning

Achieves excellent performance with fine-tuning on just 61k UltraFeedback dataset instances.

Multi-Task Performance

Outperforms similar models in multiple benchmarks including AlpacaEval and MT-Bench.

Model Capabilities

Text Generation

Dialogue Systems

Question Answering

Instruction Following

Use Cases

Dialogue Systems

Intelligent Assistant

Can be used to build intelligent dialogue assistants.

Achieves a 91.16% win rate on AlpacaEval 1.0.

Educational Applications

Educational Q&A

Can be used for question-answering systems in the education field.

Achieves 63.26% accuracy on the MMLU test.

🚀 Mistral-ORPO-β (7B)

Mistral-ORPO-β is a fine - tuned language model. It's based on mistralai/Mistral-7B-v0.1 and uses the odds ratio preference optimization (ORPO) method. With ORPO, the model can directly learn preferences without a supervised fine - tuning warm - up phase. This version, Mistral-ORPO-β, is fine - tuned solely on 61k instances from the cleaned UltraFeedback dataset, argilla/ultrafeedback-binarized-preferences-cleaned, provided by Argilla.

Github Repository: https://github.com/xfactlab/orpo

✨ Features

Model Performance

1) AlpacaEval & MT - Bench

Model Name	Size	Align	MT - Bench	AlpacaEval 1.0	AlpacaEval 2.0
Mistral-`ORPO`-⍺	7B	`ORPO`	7.23	87.92	11.33
Mistral-`ORPO`-β	7B	`ORPO`	7.32	91.41	12.20
Zephyr β	7B	DPO	7.34	90.60	10.99
TULU - 2 - DPO	13B	DPO	7.00	89.5	10.12
Llama - 2 - Chat	7B	RLHF	6.27	71.37	4.96
Llama - 2 - Chat	13B	RLHF	6.65	81.09	7.70

2) IFEval

Model Type	Prompt - Strict	Prompt - Loose	Inst - Strict	Inst - Loose
Mistral - ORPO - ⍺	0.5009	0.5083	0.5995	0.6163
Mistral - ORPO - β	0.5287	0.5564	0.6355	0.6619

MT - Bench by Category

image/png

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("kaist-ai/mistral-orpo-beta")
tokenizer = AutoTokenizer.from_pretrained("kaist-ai/mistral-orpo-beta")

# Apply chat template
query = [{'role': 'user', 'content': 'Hi! How are you doing?'}]
prompt = tokenizer.apply_chat_template(query, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors='pt')

# Generation with specific configurations
output = model.generate(
  **inputs,
  max_new_tokens=128,
  do_sample=True,
  temperature=0.7
)
response = tokenizer.batch_decode(output)

#<|user|>
#Hi! How are you doing?</s>
#<|assistant|>
#I'm doing well, thank you! How are you?</s>

📚 Documentation

Model Information

Property	Details
Model Type	Mistral - ORPO - β (7B)
Base Model	mistralai/Mistral - 7B - v0.1
Training Data	argilla/ultrafeedback - binarized - preferences - cleaned
Pipeline Tag	text - generation

Model Results

The model has been evaluated on multiple datasets, and the results are as follows:

AI2 Reasoning Challenge (25 - Shot): Normalized accuracy of 61.18. Source: Open LLM Leaderboard
HellaSwag (10 - shot): Normalized accuracy of 84.03. Source: Open LLM Leaderboard
TruthfulQA (0 - shot): MC2 value of 47.69. Source: Open LLM Leaderboard
GSM8k (5 - shot): Accuracy of 39.8. Source: Open LLM Leaderboard
MMLU (5 - Shot): Accuracy of 63.26. Source: Open LLM Leaderboard
Winogrande (5 - shot): Accuracy of 79.24. Source: Open LLM Leaderboard
AlpacaEval 1: Win Rate of 91.16%. Source: Leaderboard
AlpacaEval 2: Win Rate of 12.57%. Source: Leaderboard
MT - Bench: Score of 7.322. Source: self - reported

📄 License

This project is licensed under the MIT License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご