Llama-3-OffsetBias-RM-8B Open-Source Reward Model - Accurately Evaluate Biases, with Stronger Robustness

Home

Llama 3 OffsetBias RM 8B

Developed by NCSOFT

A reward model trained on the OffsetBias dataset, offering enhanced robustness against biases in evaluation models

Large Language Model

Transformers

English#Anti-bias Reward Model #Multi-source Data Fusion #Instruction Alignment Evaluation

Downloads 1,782

Release Time : 7/11/2024

Model Overview

This model is a reward model based on the Llama-3 architecture, specifically designed to mitigate various biases commonly encountered in model evaluation. Trained by integrating multiple high-quality datasets, it is particularly suitable for scenarios requiring fair assessment.

Model Features

Bias Robustness

Specially optimized to address various common biases in evaluation models, providing fairer scoring

Multi-dataset Fusion

Trained by combining multiple high-quality datasets including UltraFeedback and HelpSteer

Model Fusion Technique

Obtained the final model through the fusion of intermediate models with the base reward model

Model Capabilities

Text Quality Evaluation

Dialogue Response Scoring

Safety Assessment

Reasoning Ability Evaluation

Use Cases

AI Dialogue Evaluation

Chatbot Response Scoring

Evaluating the quality and relevance of chatbot responses

Achieved a score of 97.21 on RewardBench chat evaluation

Content Safety Evaluation

Harmful Content Detection

Identifying and scoring potentially harmful or inappropriate content

Achieved a score of 89.01 on RewardBench safety evaluation

🚀 Llama-3-OffsetBias-RM-8B

A reward model trained on the OffsetBias dataset, designed to be more robust against various evaluation biases.

🚀 Quick Start

Llama-3-OffsetBias-RM-8B is a reward model trained on the OffsetBias dataset. It aims to be more robust against various evaluation biases commonly found in evaluation models. The model is introduced in the paper "OffsetBias: Leveraging Debiased Data for Tuning Evaluators".

✨ Features

Trained on the OffsetBias dataset to enhance robustness against evaluation biases.
Built upon Meta Llama 3 architecture.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, pipeline
import torch

model_name = "NCSOFT/Llama-3-OffsetBias-RM-8B"
rm_tokenizer = AutoTokenizer.from_pretrained(model_name)
rm_pipe = pipeline(
    "sentiment-analysis",
    model=model_name,
    device="auto",
    tokenizer=rm_tokenizer,
    model_kwargs={"torch_dtype": torch.bfloat16}
)

pipe_kwargs = {
    "return_all_scores": True,
    "function_to_apply": "none",
    "batch_size": 1
}

chat = [
 {"role": "user", "content": "Hello, how are you?"},
 {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
 {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

test_texts = [rm_tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False).replace(rm_tokenizer.bos_token, "")]
pipe_outputs = rm_pipe(test_texts, **pipe_kwargs)
rewards = [output[0]["score"] for output in pipe_outputs]

📚 Documentation

Model Details

Model Description

Llama-3-OffsetBias-RM-8B uses sfairXC/FsfairX-LLaMA3-RM-v0.1 as the base model, which is built with Meta Llama 3. An intermediate reward model is trained from Llama-3-8B-Instruct using a subset of the dataset used in the training of the FsfairX-LLaMA3-RM model, combined with the NCSOFT/offsetbias dataset. The intermediate model is then merged with the FsfairX-LLaMA3-RM model to create Llama-3-OffsetBias-RM-8B.

Property	Details
Developed by	NC Research
Language(s) (NLP)	English
License	META LLAMA 3 COMMUNITY LICENSE AGREEMENT
Finetuned from model	sfairXC/FsfairX-LLaMA3-RM-v0.1

Model Sources

💻 Repository: https://github.com/ncsoft/offsetbias
📜 Paper: OffsetBias: Leveraging Debiased Data for Tuning Evaluators
🤗 Dataset: https://huggingface.co/datasets/NCSOFT/offsetbias

Uses

The model can be used directly as shown in the code example above for sentiment analysis tasks.

Evaluation

RewardBench Result

Metric	Score
Chat	97.21
Chat Hard	80.70
Safety	89.01
Reasoning	90.60

EvalBiasBench Result

Metric	Score
Length	82.4
Concreteness	92.9
Empty Reference	46.2
Content Continuation	100.0
Nested Instruction	83.3
Familiar Knowledge	58.3

📄 License

The model is released under the META LLAMA 3 COMMUNITY LICENSE AGREEMENT.

📚 Citation

@misc{park2024offsetbias,
      title={OffsetBias: Leveraging Debiased Data for Tuning Evaluators},
      author={Junsoo Park and Seungyeon Jwa and Meiying Ren and Daeyoung Kim and Sanghyuk Choi},
      year={2024},
      eprint={2407.06551},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご