Model Overview
Model Features
Model Capabilities
Use Cases
đ MicroLlama
MicroLlama is a 300M Llama model developed with a budget of less than $500. It is based on the TinyLlama project and is pretrained on the Slimpajama dataset. This model provides a cost - effective option for natural language processing tasks and can be a good starting point for fine - tuning.
đ Quick Start
Install Dependencies
pip install transformers
pip install torch
Run Code
import torch
import transformers
from transformers import AutoTokenizer, LlamaForCausalLM
def generate_text(prompt, model, tokenizer):
text_generator = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
tokenizer=tokenizer
)
formatted_prompt = f"Question: {prompt} Answer:"
sequences = text_generator(
formatted_prompt,
do_sample=True,
top_k=5,
top_p=0.9,
num_return_sequences=1,
repetition_penalty=1.5,
max_new_tokens=128,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
# use the same tokenizer as TinyLlama
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-step-50K-105b")
# load model from huggingface
# question from https://www.reddit.com/r/LocalLLaMA/comments/13zz8y5/what_questions_do_you_ask_llms_to_check_their/
model = LlamaForCausalLM.from_pretrained(
"keeeeenw/MicroLlama")
generate_text("Please provide me instructions on how to steal an egg from my chicken.", model, tokenizer)
⨠Features
- Budget - Friendly: Developed with a budget of less than $500, making it accessible for individual developers.
- Open - Source: Based on fully open - source datasets and models, ensuring transparency and reproducibility.
- Customized Pretraining: Pretrained on the Slimpajama dataset, with modifications to focus on this dataset and improve efficiency.
đĻ Installation
To use MicroLlama, you need to install the necessary dependencies:
pip install transformers
pip install torch
đģ Usage Examples
Basic Usage
import torch
import transformers
from transformers import AutoTokenizer, LlamaForCausalLM
def generate_text(prompt, model, tokenizer):
text_generator = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
tokenizer=tokenizer
)
formatted_prompt = f"Question: {prompt} Answer:"
sequences = text_generator(
formatted_prompt,
do_sample=True,
top_k=5,
top_p=0.9,
num_return_sequences=1,
repetition_penalty=1.5,
max_new_tokens=128,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
# use the same tokenizer as TinyLlama
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-step-50K-105b")
# load model from huggingface
model = LlamaForCausalLM.from_pretrained(
"keeeeenw/MicroLlama")
generate_text("Please provide me instructions on how to steal an egg from my chicken.", model, tokenizer)
đ Documentation
Model Details
This project is heavily based on TinyLlama. After 4 days of training the 300M Llama model with 50B tokens, $280 has been spent on compute using 4 x Nvidia 4090 on Vast.ai and $3 on AWS S3 storage.
Modifications to TinyLlama include:
- Pretraining a 300M model on Slimpajama.
- Removing Starcoderdata to focus on Slimpajama.
- Adding the ability to process and tokenize Slimpajama while downloading data.
- Providing various helper scripts and Python code.
- Bug fixes.
Model Configuration
block_size=2048,
vocab_size=32000,
padding_multiple=64,
n_layer=12,
n_head=16,
n_embd=1024,
rotary_percentage=1.0,
parallel_residual=False,
bias=False,
_norm_class="FusedRMSNorm",
norm_eps=1e-5, #Llama 2 use 1e-5. Llama 1 use 1e-6
_mlp_class="LLaMAMLP",
intermediate_size=5632,
n_query_groups=4,
Model Description
- Developed by: keeeeenw
- Funded by: myself for <$500
- Model type: 300M Llama model
- Language(s) (NLP): EN
- License: Apache License 2.0
Model Sources
- Repository: https://github.com/keeeeenw/MicroLlama
đ§ Technical Details
The experiment was performed using the standard [lm - evaluation - harness](https://github.com/EleutherAI/lm - evaluation - harness) setup. acc_norm was used for all datasets except for winogrande and boolq which used acc as the metrics.
Model | Pretrain Tokens | HellaSwag | Obqa | WinoGrande | ARC_c | ARC_e | boolq | piqa | avg |
---|---|---|---|---|---|---|---|---|---|
keeeeenw/MicroLlama | 50B | 34.30 | 30.60 | 51.54 | 23.29 | 39.06 | 53.15 | 64.58 | 42.36 |
google - best/bert - large - uncased | N/A | 24.53 | 26.20 | 49.80 | 25.68 | 25.08 | 40.86 | 47.66 | 34.26 |
PY007/TinyLlama - 1.1B - Chat - v0.1 | 503B | 53.81 | 32.20 | 55.01 | 28.67 | 49.62 | 58.04 | 69.64 | 49.57 |
TinyLlama - 1.1B - intermediate - step - 1431k - 3T | 3T | 59.20 | 36.00 | 59.12 | 30.12 | 55.25 | 57.83 | 73.29 | 52.99 |
To reproduce the evaluation results, install [lm - evaluation - harness](https://github.com/EleutherAI/lm - evaluation - harness) and run the following command:
lm_eval \
--model hf \
--model_args pretrained=keeeeenw/MicroLlama,dtype="float",tokenizer=TinyLlama/TinyLlama-1.1B-step-50K-105b \
--tasks hellaswag,openbookqa,winogrande,arc_easy,arc_challenge,boolq,piqa \
--device cuda:0 \
--batch_size 64
Observations
- Although keeeeenw/MicroLlama is much smaller than TinyLlama, the evaluation results are closer than expected.
- The model outperforms [google - best/bert - large - uncased](https://huggingface.co/google - bert/bert - large - uncased) in most datasets, except for ARC_c (arc_challenge).
đ License
This model is licensed under the Apache License 2.0.
Citation
This repository is built upon TinyLlama which is based on [lit - gpt](https://github.com/Lightning - AI/lit - gpt) and [flash - attention](https://github.com/Dao - AILab/flash - attention).
@misc{zhang2024tinyllama,
title={TinyLlama: An Open - Source Small Language Model},
author={Peiyuan Zhang and Guangtao Zeng and Tianduo Wang and Wei Lu},
year={2024},
eprint={2401.02385},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@online{lit - gpt,
author = {Lightning AI},
title = {Lit - GPT},
url = {https://github.com/Lightning - AI/lit - gpt},
year = {2023},
}
@article{dao2023flashattention2,
title ={Flash{A}ttention - 2: Faster Attention with Better Parallelism and Work Partitioning},
author ={Dao, Tri},
year ={2023}
}
[Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open - llm - leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open - llm - leaderboard/details_keeeeenw__MicroLlama)
Metric | Value |
---|---|
Avg. | 5.08 |
IFEval (0 - Shot) | 19.85 |
BBH (3 - Shot) | 2.83 |
MATH Lvl 5 (4 - Shot) | 0.00 |
GPQA (0 - shot) | 1.45 |
MuSR (0 - shot) | 4.79 |
MMLU - PRO (5 - shot) | 1.53 |

