🚀 MobiLlama-05B
MobiLlama-05B is a Small Language Model with 0.5 billion parameters. It addresses the needs of resource - constrained devices, offering enhanced performance with reduced resource demands.
🚀 Quick Start
To use MobiLlama-05B, you can follow the code example below:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("MBZUAI/MobiLlama-05B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("MBZUAI/MobiLlama-05B", trust_remote_code=True)
model.to('cuda')
text = "I was walking towards the river when "
input_ids = tokenizer(text, return_tensors="pt").to('cuda').input_ids
outputs = model.generate(input_ids, max_length=1000, repetition_penalty=1.2, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())
✨ Features
- Resource - efficient: Designed for resource - constrained devices, reducing pre - training and deployment costs.
- Transparent: All training data pipeline, training code, model weights, and over 300 checkpoints along with evaluation codes are available on Github.
- Based on LLaMA - 7B: Built using the architecture design of LLaMA - 7B.
📚 Documentation
Model Summary
In the development of Large Language Models (LLMs), the trend has been "bigger the better". However, LLMs are not suitable for on - device processing, energy efficiency, low memory footprint, and response efficiency. This paper explores the ‘less is more’ paradigm by designing an accurate and fully transparent open - source 0.5 billion (0.5B) parameter Small Language Model (SLM) named MobiLlama. MobiLlama starts from a larger model and applies a parameter sharing scheme to cut down costs. All related resources are available on Github.
Arxiv Paper Link
Model Description
Training DataMix
Subset |
Tokens (Billion) |
Arxiv |
30.00 |
Book |
28.86 |
C4 |
197.67 |
Refined - Web |
665.01 |
StarCoder |
291.92 |
StackExchange |
21.75 |
Wikipedia |
23.90 |
Total |
1259.13 |
Hyperparameters
Hyperparameter |
Value |
Total Parameters |
0.52B |
Hidden Size |
2048 |
Intermediate Size (MLPs) |
5632 |
Number of Attention Heads |
32 |
Number of Hidden Lyaers |
22 |
RMSNorm ɛ |
1e^-5 |
Max Seq Length |
2048 |
Vocab Size |
32000 |
Evaluation
Evaluation Benchmark |
MobiLlama - 0.5B |
MobiLlama - 0.8B |
MobiLlama - 1.2B |
HellaSwag |
52.52 |
54.09 |
62.99 |
MMLU |
26.45 |
26.92 |
24.23 |
Arc Challenge |
29.52 |
30.20 |
34.55 |
TruthfulQA |
38.05 |
38.48 |
35.57 |
CrowsPairs |
64.03 |
64.82 |
68.12 |
PIQA |
72.03 |
73.17 |
75.29 |
Race |
33.68 |
33.37 |
35.31 |
SIQA |
40.22 |
41.60 |
41.96 |
Winogrande |
57.53 |
57.45 |
61.08 |
Citation
BibTeX:
@misc{thawakar2024mobillama,
title={MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT},
author={Omkar Thawakar and Ashmal Vayani and Salman Khan and Hisham Cholakkal and Rao Muhammad Anwer and Michael Felsberg and Timothy Baldwin and Eric P. Xing and Fahad Shahbaz Khan},
year={2024},
eprint={2402.16840},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
📄 License
The model is licensed under Apache 2.0.