G

Gemma 2 9b It WPO HB

Developed by wzhouad
A large language model fine-tuned from the gemma-2-9b-it model using the Weighted Preference Optimization (WPO) method, enhancing the effectiveness of off-policy preference optimization.
Downloads 15
Release Time : 8/8/2024

Model Overview

This model employs the WPO method to narrow the distribution gap between offline and online data by reweighting preference pairs, optimizing the training process. Primarily used for text generation and dialogue tasks.

Model Features

Weighted Preference Optimization (WPO)
Reweights preference pairs based on probabilities under the current policy, bringing offline data closer to online data and addressing distribution gaps.
Hybrid data training
Combines online sampled outputs from the gemma model and outputs from GPT-4-turbo, using ArmoRM-Llama3-8B-v0.1 for scoring and selection.
Efficient training
Optimizes the training process without additional costs, improving model performance.

Model Capabilities

Text generation
Dialogue systems
Preference learning

Use Cases

Dialogue systems
Intelligent assistant
Can be used to build high-quality dialogue assistants
Achieved a 76.73% LC score on the AlpacaEval evaluation
Educational research
Preference learning research
Can be used to study off-policy preference optimization methods
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase