Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Off-policy optimization
# Off-policy optimization
Gemma 2 9b It WPO HB
A large language model fine-tuned from the gemma-2-9b-it model using the Weighted Preference Optimization (WPO) method, enhancing the effectiveness of off-policy preference optimization.
Large Language Model
Transformers
G
wzhouad
15
36
Featured Recommended AI Models
Empowering the Future, Your AI Solution Knowledge Base
English
简体中文
繁體中文
にほんご
© 2025
AIbase