Model Selection

Off-policy optimization

# Off-policy optimization

Gemma 2 9b It WPO HB

A large language model fine-tuned from the gemma-2-9b-it model using the Weighted Preference Optimization (WPO) method, enhancing the effectiveness of off-policy preference optimization.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase