# RLHF Reinforcement Learning
Neuralbeagle14 7B 8.0bpw H8 Exl2
Apache-2.0
NeuralBeagle14-7B is a 7B-parameter large language model fine-tuned using the DPO method based on the Beagle14-7B model, excelling in the 7B parameter category.
Large Language Model
Transformers

N
LoneStriker
111
5
Japanese Gpt Neox 3.6b Instruction Ppo
MIT
A 3.6 billion parameter Japanese GPT-NeoX model trained with Reinforcement Learning from Human Feedback (RLHF), capable of better following instructions in conversations.
Large Language Model
Transformers Supports Multiple Languages

J
rinna
3,062
71
Featured Recommended AI Models