Model Selection

RLHF Reinforcement Learning

# RLHF Reinforcement Learning

Neuralbeagle14 7B 8.0bpw H8 Exl2

NeuralBeagle14-7B is a 7B-parameter large language model fine-tuned using the DPO method based on the Beagle14-7B model, excelling in the 7B parameter category.

Large Language Model

Japanese Gpt Neox 3.6b Instruction Ppo

A 3.6 billion parameter Japanese GPT-NeoX model trained with Reinforcement Learning from Human Feedback (RLHF), capable of better following instructions in conversations.

Large Language Model

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase