Baby-llama-58m Open Source Baby Camel Model - A Lightweight Language Model Tool Suitable for Challenges

Baby Llama 58m

Developed by timinar

The Baby Llama model is a language model with 58 million parameters, distilled from LLaMA and GPT2, and designed specifically for the Small Language Model Challenge.

Large Language Model

Transformers

English#Small language model distillation #Multi-task fine-tuning optimization #High efficiency with low parameter count

Downloads 442

Release Time : 7/29/2023

Model Overview

The Baby Llama model is a small language model trained on the babylm_10M dataset by distilling the LLaMA and GPT2 models, suitable for various natural language processing tasks.

Model Features

Efficient distillation

By distilling from two large models, LLaMA and GPT2, the parameter scale is significantly reduced while maintaining performance.

Small-scale optimization

Specifically designed for the Small Language Model Challenge, optimizing performance with limited parameters.

Task adaptability

Provides detailed fine-tuning parameter settings for different NLP tasks to avoid overfitting.

Model Capabilities

Text classification

Question answering system

Language understanding

Text matching

Use Cases

Academic research

Small language model research

Used to explore the capability boundaries and optimization methods of small-scale language models

Achieved competitive performance in the BabyLM Challenge

Educational applications

Language learning assistance

Can be used to develop lightweight language learning tools

Task	Maximum learning rate	Batch size	Maximum epochs	Patience	Evaluate every (steps)	Random seed
CoLA	4e-5	64	3	10	20	12
SST-2	5e-5	64	6	10	200	12
MRPC	3e-5	64	3	10	20	12
QQP	4e-5	64	10	10	1000	12
MNLI	5e-5	64	6	10	200	12
MNLI-mm	5e-5	64	6	10	200	12
QNLI	5e-5	64	6	10	200	12
RTE	5e-5	64	6	10	200	12
BoolQ	3e-4	16	10	10	10	12
MultiRC	1e-4	64	7	10	1000	42
WSC	5e-7	1	10	1000	2000	12
CR (Control)	5e-5	64	10	10	100	12
LC (Control)	1e-3	64	1	2	10	12
MV (Control)	5e-5	64	6	10	200	12
RP (Control)	1e-3	64	1	10	10	12
SC (Control)	1e-3	64	2	10	10	12
CR_LC	1e-3	64	2	10	10	12
CR_RTP	5e-5	64	6	10	200	12
MV_LC	5e-5	64	6	10	200	12
MV_RTP	5e-5	64	6	10	200	12
SC_LC	1e-3	64	2	10	10	12
SC_RP	1e-3	64	2	10	10	12

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Baby Llama 58m

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Baby Llama

📚 Documentation

Hyperparameters for the tasks that require fine-tuning

📄 License