đ Rhea-72b-v0.5
The Rhea project focuses on researching various learning methods to enhance the performance of LLM models. This fine - tuned model ranks first on HuggingFace's Open LLM leaderboard.

The Rhea project conducts research on diverse learning methods to improve the performance of LLM models. We fine - tuned the existing model using the nox framework. We built a dataset for SFT learning based on the currently open dataset and created a dataset using SGD (Self - Generated Dataset Creation Method for DPO Learning) for DPO learning.
Our model ranked first on HuggingFace's Open LLM leaderboard.
đ Quick Start
The README doesn't provide specific quick - start steps. If you want to use the model, you can refer to the official repository https://github.com/davidkim205/nox for more information.
⨠Features
SGD : A Study on Self - Generated Dataset creation method for DPO Learning
This method proposes a novel approach for generating datasets for DPO (Self - supervised Learning) models. We suggest a technique where sentences generated by the model are compared with the actual correct answers from an existing dataset, and sentences where the model's generated results do not match the correct answers are added. This enables the model to autonomously create training data, thereby enhancing the performance of DPO models.
đ Documentation
Model Details
Property |
Details |
Model Developers |
davidkim(changyeon kim) |
Repository |
https://github.com/davidkim205/nox |
Base Model |
abacusai/Smaug-72B-v0.1 |
SFT Dataset |
datasets_enconv_4m |
DPO Dataset |
datasets_encomp_151k |
sft dataset info : datasets_enconv_4m
100k random shuffle datasets
- stack - exchange - preferences
- SlimOrca
- alpaca - gpt4
- SHP
- HC3
- databricks - dolly - 15k
- orca - dpo - pairs
- us - stockname
- OpenHermes2.5 - dpo - binarized - alpha
- distilabel - math - preference - dpo
- Neural - DPO
- truthy - dpo - v0.1
- distilabel - capybara - dpo - 7k - binarized
- us - sentiment
- contextual - dpo - v0.1
1k random shuffle datasets
- bigbench
- glue_mnli
- glue_qqp
- xnli
- codexglue_code2text_go
- trivia_qa
- medmcqa
- hendrycks_ethics
- super_glue_record
- glue_qnli
- anli_r3
- swag
- squad_v2
- nq_open
- drop
- glue_sst2
- blimp
- paws - x
- unscramble
- anli_r2
- babi
- math_qa
- social_i_qa
- piqa
- arithmetic
- anli_r1
- prost
- sciq
- mc_taco
- medqa
- super_glue_boolq
- hendrycks_math
- lambada
- toxigen - data
- glue_cola
- pubmed_qa
- logiqa
- mutual
- headqa
- bbh
- super_glue_wic
- openbookqa
- glue_mrpc
- web_questions
- qasper
- super_glue_multirc
- story_cloze
- super_glue_rte
- glue_rte
- race
- xwinograd
- asdiv
- xstory_cloze
- crows_pairs_multilingual
- belebele
- glue_wnli
- super_glue_wsc
- coqa
- super_glue_copa
- super_glue_cb
- winograd_wsc
- mgsm
- scrolls_contract_nli
â ī¸ Important Note
If the data set cannot be found, it is internal company data and cannot be made public.
dpo dataset info : datasets_encomp_151k
Randomly selecting data from each category within the training dataset, we constructed a DPO (Direct Preference Optimization) dataset using sentences with logits lower than the mean within the model - generated sentences.
â ī¸ Important Note
I'm sorry I can't reveal it.
đ License
The model is licensed under the Apache 2.0 license.
đ§ Technical Details
The model achieved excellent results on HuggingFace's Open LLM leaderboard. Here are the detailed evaluation results:
Metric |
Value |
Avg. |
81.22 |
AI2 Reasoning Challenge (25 - Shot) |
79.78 |
HellaSwag (10 - Shot) |
91.15 |
MMLU (5 - Shot) |
77.95 |
TruthfulQA (0 - shot) |
74.50 |
Winogrande (5 - shot) |
87.85 |
GSM8k (5 - shot) |
76.12 |
Detailed results can be found [here](https://huggingface.co/datasets/open - llm - leaderboard/details_davidkim205__Rhea - 72b - v0.5)