đ LiteLlama: Reduced-Scale Llama
We present an open-source reproduction of Meta AI's LLaMa 2 with significantly reduced model sizes.
This project offers an open - source replication of Meta AI's LLaMa 2. Notably, it features substantially reduced model sizes. For instance, LiteLlama - 460M - 1T has 460M parameters and is trained with 1T tokens.
⨠Features
- Reduced Scale: Significantly smaller model sizes compared to the original LLaMa 2.
- Open - Source: An open - source reproduction for community use.
đĻ Installation
This section is not provided in the original README, so it is skipped.
đģ Usage Examples
Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = 'ahxt/LiteLlama-460M-1T'
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model.eval()
prompt = 'Q: What is the largest bird?\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
tokens = model.generate(input_ids, max_length=20)
print( tokenizer.decode(tokens[0].tolist(), skip_special_tokens=True) )
đ Documentation
Dataset and Tokenization
We train our models on part of RedPajama dataset. We use the GPT2Tokenizer to tokenize the text.
Training Details
The model was trained with ~1T tokens (0.98T). num of tokens = stepslengthbatch_size = 4996791024192 = 98240888832â0.98T.
The training curve is at this WandB project.
Evaluation
MMLU Task Evaluation
Models |
#parameters |
zero - shot |
5 - shot |
llama |
7B |
28.46 |
35.05 |
openllama |
3B |
24.90 |
26.71 |
TinyLlama - 1.1B - step - 50K - 105b |
1.1B |
19.00 |
26.53 |
LiteLlama - 460M - 1T |
0.46B |
21.13 |
26.39 |
Detailed results can be found here
Metric |
Value |
Avg. |
26.65 |
ARC (25 - shot) |
24.91 |
HellaSwag (10 - shot) |
38.47 |
MMLU (5 - shot) |
26.17 |
TruthfulQA (0 - shot) |
41.59 |
Winogrande (5 - shot) |
49.88 |
GSM8K (5 - shot) |
0.0 |
DROP (3 - shot) |
5.51 |
đ§ Technical Details
The model's training process involves specific calculations for the number of tokens used, and the training curve can be tracked through the provided WandB project link.
đ License
This model is released under the MIT License.
đ Contact
This model was developed by Xiaotian Han from Texas A&M University at the DATA Lab under the supervision of Prof. Xia "Ben" Hu.