UNA-TheBeagle-7b-v1 Open Source AI Model - Exceptional Performance in Multiple Tasks, Available for Free

UNA TheBeagle 7b V1

Developed by fblgit

TheBeagle is a 7-billion-parameter model trained on The Bagel dataset, optimized with DPO (Direct Preference Optimization) and UNA (Unified Neural Architecture) techniques, demonstrating excellent performance in multi-task scenarios.

Large Language Model

Transformers

#DPO Optimization #Multi-task Generalization #Academic Research Specialized

Downloads 88

Release Time : 1/9/2024

Model Overview

This model is a 7-billion-parameter large language model optimized with a carefully selected DPO paired dataset, based on Intel's neural-chat model, and has shown outstanding performance in multiple benchmark tests.

Model Features

DPO Optimization

Trained with Direct Preference Optimization techniques on a carefully selected DPO paired dataset

UNA Architecture

Optimizes perceptron layers using Unified Neural Architecture, with a learning rate set to 3.5e-7

High Performance

Achieves excellent results in multiple benchmarks including ARC, GSM8K, and HellaSwag

Data Decontamination

The dataset undergoes rigorous decontamination to ensure training quality

Model Capabilities

Text generation

Question answering

Mathematical reasoning

Commonsense reasoning

Logical reasoning

Use Cases

Academic Research

Natural Language Processing Research

Can be used for language model performance comparison and new technology validation

Performs excellently in multiple benchmark tests

Educational Applications

Mathematical Problem Solving

Solves mathematical problems such as those in GSM8K

Achieves an exact match rate of 72.1%

🚀 UNA-TheBeagle-7b-v1

TheBeagle is a 7B-parameter model trained on The Bagel dataset. DPO and UNA are applied to a set of curated DPO Pairs. It ranks #1 on the HF Leaderboard with remarkable scores, achieving 73 on ARC and showing a well-balanced performance.

🚀 Quick Start

TheBeagle offers excellent performance across various tasks. We encourage you to explore its capabilities firsthand.

✨ Features

High Ranking: Scored #1 on the HF Leaderboard with 73 ARC and well - balanced results.
Trained on Quality Data: The dataset is generated using the original bagel code, including a decontamination step.
Based on Strong Base Model: Utilizes Intel's latest neural - chat model as the base.

📦 Installation

No installation steps are provided in the original document.

💻 Usage Examples

No code examples are provided in the original document.

📚 Documentation

Evaluations

The evaluations were run with VLLM. Note that the results may not be exactly the same as those shown on the leaderboard, but they are close.

vllm (pretrained=fblgit/UNA-TheBeagle-7b-v1,dtype=auto,tensor_parallel_size=1,gpu_memory_utilization=0.8,data_parallel_size=8,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 32
|    Tasks     |Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|--------------|-------|----------|-----:|-----------|-----:|---|-----:|
|arc_challenge |Yaml   |none      |    25|acc        |0.7090|±  |0.0133|
|              |       |none      |    25|acc_norm   |0.7329|±  |0.0129|
|gsm8k         |Yaml   |get-answer|     5|exact_match|0.7210|±  |0.0124|
|hellaswag     |Yaml   |none      |    10|acc        |0.7202|±  |0.0045|
|              |       |none      |    10|acc_norm   |0.8792|±  |0.0033|
|truthfulqa_mc2|Yaml   |none      |     0|acc        |0.7062|±  |0.0151|
|winogrande    |Yaml   |none      |     5|acc        |0.8366|±  |0.0104|

UNA Details

For this release, UNA was only applied through the perceptrons at a speed of 3.5e - 7. The training loop code is from the original bagel and transformers - 4.35.2 - UNA.

Prompt

We used the vanilla version of the bagel training code, so we're not entirely certain about the prompt. However, a good model should be able to generalize with different prompt formats. Feel free to experiment.

Citations

If you use UNA's models, remember to cite it in your model card.

Limitations

This model is not for commercial use and is intended only for academic and research purposes.

🔧 Technical Details

The dataset was generated using the original bagel code, including the decontamination step. As a base model, the latest Intel's neural - chat model was used. For this release, UNA was applied through the perceptrons at a speed of 3.5e - 7, and the training loop code is from the original bagel and transformers - 4.35.2 - UNA.

📄 License

This model is licensed under cc - by - nc - nd - 4.0.

TheBeagle

-- In the Love Memory of my "LoLa" --

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご