đ Llama-3-6B Model
Introducing the world's first Llama-3 base model with 6B parameters, offering unique capabilities in the AI field.
đ Quick Start
You can check the trained version of this model here:
Llama-3-6B-v0.1
⨠Features
- It's the world's first Llama-3 base model with 6B parameters.
- Created from Meta - Llama-3-8B using the downcycling technique.
đ Documentation
Model Summary
This is the world's first Llama-3 base model with 6B parameters. It is an untrained model created from Meta-Llama-3-8B using a technique called downcycling.
Model Description
Property |
Details |
Model Type |
Llama |
License |
Llama-3 |
Model Sources
Citation
BibTeX:
@misc{prince2024downcycling,
title={Efficient LLM Downcycling: Generating Diverse Model Sizes from Pretrained Giants},
author={Prince Canuma},
year={2024},
}
References
@misc{komatsuzaki2023sparse,
title={Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints},
author={Aran Komatsuzaki and Joan Puigcerver and James Lee-Thorp and Carlos Riquelme Ruiz and Basil Mustafa and Joshua Ainslie and Yi Tay and Mostafa Dehghani and Neil Houlsby},
year={2023},
eprint={2212.05055},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{sanyal2024pretraining,
title={Pre-training Small Base LMs with Fewer Tokens},
author={Sunny Sanyal and Sujay Sanghavi and Alexandros G. Dimakis},
year={2024},
eprint={2404.08634},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
đ License
This model is licensed under Llama-3.
Thank You!
I want to extend my heartfelt thanks to the community for the invaluable expertise and unwavering support.
Additionally, I would like to thank Viet from General Catalyst (GC) for providing me with the much-needed compute.
This is my most ambitious project yet, and it wouldn't have been possible without the incredible open-source ML community!
Developers, I am eager to see and hear about the innovative fine-tunes and applications you create.
Users, I am excited to learn about your experiences and use cases.
Thank you for your interest and support!