Sheared-LLaMA-2.7B Lightweight Language Model - Open Source and Only Consumes 50B Tokens Budget

Sheared LLaMA 2.7B

Developed by princeton-nlp

Sheared-LLaMA-2.7B is a lightweight language model derived from Llama-2-7b through pruning and continued pretraining, consuming only a 50B token budget.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Pruning Optimization #Efficient Pretraining #Multi-domain Adaptation

Downloads 1,131

Release Time : 10/10/2023

Model Overview

This model is compressed from Llama-2-7b using structured pruning techniques, retaining the core capabilities of the original model while excelling in multiple downstream tasks.

Model Features

Efficient Pruning

Uses only 0.4B tokens for pruning, significantly reducing model size

Efficient Training

Achieves excellent performance with only 50B tokens for continued pretraining

Superior Performance

Outperforms models of the same scale at both 1.3B and 2.7B sizes

Model Capabilities

Text generation

Language understanding

Reasoning tasks

Reading comprehension

Knowledge-intensive tasks

Use Cases

Natural Language Processing

Language Modeling

Used for generating coherent text content

Performs excellently in language modeling tasks

Question Answering Systems

Building knowledge-intensive Q&A applications

Performs well in reading comprehension tasks

🚀 Sheared-LLaMA

Sheared-LLaMA is a series of language models that prune and further pre - train existing large language models. It offers smaller - scale models with high performance on various downstream tasks, derived with a limited token budget.

🚀 Quick Start

Sheared-LLaMA-2.7B can be loaded into Hugging Face using the following code:

model = AutoModelForCausalLM.from_pretrained("princeton-nlp/Sheared-LLaMA-2.7B")

✨ Features

Smaller - scale: Offers more lightweight models.
Same vocabulary: Shares the same vocabulary as LLaMA1 and LLaMA2.
Efficient derivation: Derived with a budget of 50B tokens by leveraging existing strong LLMs.

📚 Documentation

Model Resources

Paper: https://arxiv.org/pdf/2310.06694.pdf
Code: https://github.com/princeton-nlp/LLM-Shearing
Models:
- Sheared-LLaMA-1.3B
- Sheared-LLaMA-2.7B
Pruned Models without Continued Pre - training:
- Sheared-LLaMA-1.3B-Pruned
- Sheared-LLaMA-2.7B-Pruned
Instruction - tuned Models:
- Sheared-LLaMA-1.3B-ShareGPT
- Sheared-LLaMA-2.7B-ShareGPT

Model Details

Sheared-LLaMA-2.7B is a model pruned and further pre - trained from meta-llama/Llama-2-7b-hf. We dynamically load data from different domains in the RedPajama dataset. We use 0.4B tokens for pruning and 50B tokens for continued pre - training the pruned model.

Downstream Tasks

We evaluate on an extensive set of downstream tasks including reasoning, reading comprehension, language modeling and knowledge - intensive tasks. Our Sheared-LLaMA models outperform existing large language models.

Model	# Pre - training Tokens	Average Performance
LLaMA2-7B	2T	64.6

1.3B

Model	# Pre - training Tokens	Average Performance
OPT-1.3B	300B	48.2
Pythia-1.4B	300B	48.9
Sheared-LLaMA-1.3B	50B	51.0

Model	# Pre - training Tokens	Average Performance
OPT-2.7B	300B	51.4
Pythia-2.8B	300B	52.5
INCITE - Base-3B	800B	54.7
Open-LLaMA-3B-v1	1T	55.1
Open-LLaMA-3B-v2	1T	55.7
Sheared-LLaMA-2.7B	50B	56.7

Bibtex

@article{xia2023sheared,
  title={Sheared llama: Accelerating language model pre-training via structured pruning},
  author={Xia, Mengzhou and Gao, Tianyu and Zeng, Zhiyuan and Chen, Danqi},
  journal={arXiv preprint arXiv:2310.06694},
  year={2023}
}

📄 License

Must comply with license of Llama2 since it's a model derived from Llama2.

⚠️ Important Note

The use of Sheared-LLaMA must comply with the license of Llama2 as it is a derived model from Llama2.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご