Rhea-72b-v0.5 Open-source Large Language Model - Leading the Ranking, Get a Free and Efficient Q&A Experience

Rhea 72b V0.5

Developed by davidkim205

Rhea-72b-v0.5 is a large language model fine-tuned based on Smaug-72B-v0.1, ranking first on the HuggingFace Open LLM Leaderboard.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #72B large model #DPO optimization #Multi-task reasoning

Downloads 103

Release Time : 3/22/2024

Model Overview

The Rhea project researches various learning methods to improve large language model performance, using the nox framework for fine-tuning, combining supervised fine-tuning (SFT) and DPO learning methods.

Model Features

SGD method

Innovative self-generated dataset creation method for DPO learning, improving performance by comparing model outputs with correct answers.

High performance

Achieved a comprehensive score of 81.22 on the HuggingFace Open LLM Leaderboard, ranking first.

Diverse training data

Utilizes supervised fine-tuning dataset (datasets_enconv_4m) and DPO dataset (datasets_encomp_151k) from multiple sources.

Model Capabilities

Text generation

Reasoning ability

Q&A system

Mathematical computation

Language understanding

Use Cases

Academic research

AI2 Reasoning Challenge

Solving complex scientific reasoning problems

Standardized accuracy 79.78

MMLU test

Multidisciplinary knowledge understanding and application

Accuracy 77.95

Commercial applications

Math problem solving

Solving GSM8k math problems

Accuracy 76.12

Language understanding

HellaSwag commonsense reasoning

Standardized accuracy 91.15

🚀 Rhea-72b-v0.5

The Rhea project focuses on researching various learning methods to enhance the performance of LLM models. This fine - tuned model ranks first on HuggingFace's Open LLM leaderboard.

image/jpeg

The Rhea project conducts research on diverse learning methods to improve the performance of LLM models. We fine - tuned the existing model using the nox framework. We built a dataset for SFT learning based on the currently open dataset and created a dataset using SGD (Self - Generated Dataset Creation Method for DPO Learning) for DPO learning.

Our model ranked first on HuggingFace's Open LLM leaderboard.

🚀 Quick Start

The README doesn't provide specific quick - start steps. If you want to use the model, you can refer to the official repository https://github.com/davidkim205/nox for more information.

✨ Features

SGD : A Study on Self - Generated Dataset creation method for DPO Learning

This method proposes a novel approach for generating datasets for DPO (Self - supervised Learning) models. We suggest a technique where sentences generated by the model are compared with the actual correct answers from an existing dataset, and sentences where the model's generated results do not match the correct answers are added. This enables the model to autonomously create training data, thereby enhancing the performance of DPO models.

📚 Documentation

Model Details

Property	Details
Model Developers	davidkim(changyeon kim)
Repository	https://github.com/davidkim205/nox
Base Model	abacusai/Smaug-72B-v0.1
SFT Dataset	datasets_enconv_4m
DPO Dataset	datasets_encomp_151k

sft dataset info : datasets_enconv_4m

100k random shuffle datasets

stack - exchange - preferences
SlimOrca
alpaca - gpt4
SHP
HC3
databricks - dolly - 15k
orca - dpo - pairs
us - stockname
OpenHermes2.5 - dpo - binarized - alpha
distilabel - math - preference - dpo
Neural - DPO
truthy - dpo - v0.1
distilabel - capybara - dpo - 7k - binarized
us - sentiment
contextual - dpo - v0.1

1k random shuffle datasets

bigbench
glue_mnli
glue_qqp
xnli
codexglue_code2text_go
trivia_qa
medmcqa
hendrycks_ethics
super_glue_record
glue_qnli
anli_r3
swag
squad_v2
nq_open
drop
glue_sst2
blimp
paws - x
unscramble
anli_r2
babi
math_qa
social_i_qa
piqa
arithmetic
anli_r1
prost
sciq
mc_taco
medqa
super_glue_boolq
hendrycks_math
lambada
toxigen - data
glue_cola
pubmed_qa
logiqa
mutual
headqa
bbh
super_glue_wic
openbookqa
glue_mrpc
web_questions
qasper
super_glue_multirc
story_cloze
super_glue_rte
glue_rte
race
xwinograd
asdiv
xstory_cloze
crows_pairs_multilingual
belebele
glue_wnli
super_glue_wsc
coqa
super_glue_copa
super_glue_cb
winograd_wsc
mgsm
scrolls_contract_nli

⚠️ Important Note

If the data set cannot be found, it is internal company data and cannot be made public.

dpo dataset info : datasets_encomp_151k

Randomly selecting data from each category within the training dataset, we constructed a DPO (Direct Preference Optimization) dataset using sentences with logits lower than the mean within the model - generated sentences.

⚠️ Important Note

I'm sorry I can't reveal it.

📄 License

The model is licensed under the Apache 2.0 license.

🔧 Technical Details

The model achieved excellent results on HuggingFace's Open LLM leaderboard. Here are the detailed evaluation results:

Metric	Value
Avg.	81.22
AI2 Reasoning Challenge (25 - Shot)	79.78
HellaSwag (10 - Shot)	91.15
MMLU (5 - Shot)	77.95
TruthfulQA (0 - shot)	74.50
Winogrande (5 - shot)	87.85
GSM8k (5 - shot)	76.12

Detailed results can be found [here](https://huggingface.co/datasets/open - llm - leaderboard/details_davidkim205__Rhea - 72b - v0.5)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご