Llama 3 6B V0.1

Developed by prince-canuma

The world's first 6-billion-parameter Llama-3 base model, created using the downgrade loop technique from Meta-Llama-3-8B and continuously pretrained on 1 billion English text tokens

Large Language Model

Transformers

English#Downgrade Loop Technique #English Continuous Pretraining #6B Parameter Efficiency

Downloads 14

Release Time : 5/17/2024

Model Overview

A 6-billion-parameter model based on the Llama-3 architecture, suitable for various instruction and dialogue applications such as programming assistants, RAG, function calling, etc.

Model Features

Downgrade Loop Technique

Creates new LLMs of different scales from large pretrained model checkpoints by replicating partial weights and initializing smaller models

Efficient Pretraining

Continuously pretrained on 1 billion pure English text tokens from FineWeb, achieving lower loss values

Multi-scenario Applicability

Can be used to create instruction and dialogue versions for various application scenarios such as programming assistants, RAG, and function calling

Model Capabilities

Text Generation

Programming Assistance

Q&A System

Knowledge Retrieval

Use Cases

Programming Development

Programming Assistant

Helps developers solve programming problems and provides code examples

Capable of generating code snippets in languages like Python

Knowledge Q&A

Technical Q&A

Answers technical-related questions

Capable of accurately answering Python language-related questions

🚀 Llama-3-6B Model

The world's first Llama-3 base model with 6B parameters, offering high performance in various NLP tasks.

🚀 Quick Start

Use the code below to get started with the model:

from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer

# Load model, config and tokenizer
model_name = "prince-canuma/Llama-3-6B-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

inputs = tokenizer(
[
   "Who created Python?"
], return_tensors = "pt")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 200)

Output:

<|begin_of_text|>Who created Python? What is Python used for? What is the difference between Python 2 and Python 3? What is the difference between Python and Python 3?
Python is a programming language that was created by Guido van Rossum in 1991. It is a widely used language for web development, data science, and machine learning. Python is also used for creating software applications and games.
Python is a powerful language that is easy to learn and use. It has a large library of built-in functions and packages that make it easy to write code. Python is also a very popular language for web development, with many popular web frameworks such as Django and Flask being written in Python.
Python is also used for data science and machine learning. It has a large library of packages for data analysis, machine learning, and artificial intelligence. Python is also used for creating software applications and games.
Python 2 and Python 3 are two different versions of the Python language. Python 2 was the original version of the

✨ Features

Base Model Innovation: It is the world's first Llama-3 base model with 6B parameters, created from Meta-Llama-3-8B using the downcycling technique.
Diverse Use Cases: Can be used to create instruct and chat versions for various scenarios like coding assistant, RAG, and function calling.
Good Performance: After continued pretraining on 1 billion tokens of English - only text from fineweb, it shows competitive performance in multiple benchmarks.

📦 Installation

This section is skipped as no installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer

# Load model, config and tokenizer
model_name = "prince-canuma/Llama-3-6B-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

inputs = tokenizer(
[
   "Who created Python?"
], return_tensors = "pt")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 200)

Advanced Usage

There is no advanced usage example in the original document, so this part is skipped.

📚 Documentation

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Prince Canuma
Sponsored by: General
Model type: Llama
License: Llama-3
Pretrained from model: prince-canuma/Llama-3-6B-v0

Model Sources

Repository: https://github.com/Blaizzy/Coding-LLMs-from-scratch/tree/main/Llama-3
Video: https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy&si=5Y4cm-6wrMOD1Abr

Uses

You can use this model to create instruct and chat versions for various use cases such as: Coding assistant, RAG, Function Calling and more.

Limitations

This model inherits some of the base model's limitations and some additional ones from its creation process, such as:

Limited scope for coding and math: According to benchmarks, this model needs more pretraining/finetuning on code and math data to excel at reasoning tasks.
Language Limitations: This model was continually pretrained on English - only data. If you are planning to use it for multilingual use cases, I recommend fine - tuning or continued pretraining.

Training Details

Downcycling

A technique that allows you to create new LLMs of diverse sizes from checkpoints of large pretrained models. You take a reference model (i.e., Llama-3-8B) and copy the weights of 24 layers out of 32 layers alongside embedding and prediction heads. Then you initialize a smaller target model with 24 layers and load those pretrained weights. This new model will most likely still output legible outputs, but for it to perform well you need to continue the pretraining.

Training Data

For continued pretraining, 1B tokens were extracted from Huggingface's FineWeb CC - Main - 2024 - 10 slice.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi - GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 64
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 2

Training results

There were 3 distinct experiments. In these experiments, QLoRA was used instead of Full Fine - tuning due to budget constraints.

v0: This was a test ran for 1K steps to check if the model would improve with QLoRA params.
v1: Here the QLoRA parameters were tweaked (Rank and Alpha).
v2: This was the main experiment, ran for 2 epochs on 1B tokens from FineWeb.

All details can be found on my Wandb dashboard: https://wandb.ai/prince-canuma/llama-3-6b?nw=nwuserprincecanuma

Overal metrics:

Training Loss	Epoch	Step	Validation Loss
7.1562	0.0	1	7.1806
2.7339	0.25	5867	2.6266
2.6905	0.5	11734	2.5872
2.6134	0.75	17601	2.5549
2.532	1.0	23468	2.5235
2.5319	1.25	29335	2.5067
2.3336	1.5	35202	2.4968
2.3486	1.75	41069	2.4942

Framework versions

PEFT 0.10.0
Transformers 4.40.0.dev0
Pytorch 2.2.0+cu121
Datasets 2.15.0
Tokenizers 0.15.0

Hardware

4xRTX6000 using JarvisLabs (Sponsored by General Catalyst thanks to Viet)

Evaluation

Benchmarks

Hellaswag: a dataset for studying grounded commonsense inference.
ARC: a multiple - choice question - answering dataset from science exams from grade 3 to grade 9.
MMLU: a test with 57 tasks to measure a text model's multitask accuracy.
TruthfulQA: a test to measure a model's propensity to reproduce falsehoods commonly found online.
Winogrande: for commonsense reasoning.
GSM8k: diverse grade school math word problems to measure a model's ability to solve multi - step mathematical reasoning problems.

Results

Pretraining for 2 epochs on 1B tokens had a positive effect across the board. The new base model now performs competitively with its reference model (Llama-3-8B) whilst being 1.3x smaller. Llama-3-6B is competitive with models within its category and up to 2x larger than itself across 6 diverse benchmarks.

Summary and future directions

This experiment was a success! Using this technique, many variants can be built. This is the first of many new base models intended to be created. Next, different data mixtures will be explored and full fine - tuning will be performed, which will contribute to developing other small models as well as larger and more robust models.

🔧 Technical Details

The downcycling technique used in model creation is described in arxiv.org/abs/2404.08634. It involves copying weights from a large reference model to initialize a smaller target model and then continuing pretraining.

📄 License

The model is licensed under Llama-3.

Citation

BibTeX:

@misc{prince2024downcycling,
      title={Efficient LLM Downcycling: Generating Diverse Model Sizes from Pretrained Giants},
      author={Prince Canuma},
      year={2024},
}

References:

@misc{komatsuzaki2023sparse,
      title={Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints}, 
      author={Aran Komatsuzaki and Joan Puigcerver and James Lee-Thorp and Carlos Riquelme Ruiz and Basil Mustafa and Joshua Ainslie and Yi Tay and Mostafa Dehghani and Neil Houlsby},
      year={2023},
      eprint={2212.05055},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{sanyal2024pretraining,
      title={Pre-training Small Base LMs with Fewer Tokens}, 
      author={Sunny Sanyal and Sujay Sanghavi and Alexandros G. Dimakis},
      year={2024},
      eprint={2404.08634},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Thank You!

I want to extend my heartfelt thanks to the community for the invaluable expertise and unwavering support.

Additionally, I would like to thank Viet from General Catalyst (GC) for providing me with the much needed compute.

This is my most ambitious project yet, and it wouldn't have been possible without the incredible open - source ML community!

Developers, I am eager to see and hear about the innovative fine - tunes and applications you create.

Users, I am excited to learn about your experiences and use cases.

Thank you for your interest and support!

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご