Sheared-LLaMA-1.3B Open-source Language Model - Efficient Text Processing, Free Deployment and Easy to Get Started!

Sheared LLaMA 1.3B

Developed by princeton-nlp

Sheared-LLaMA-1.3B is an efficient language model obtained through structured pruning and continual pre-training based on LLaMA-2-7B

Large Language Model

Transformers

Open Source License:Apache-2.0 #Structured pruning optimization #Efficient pre-training #Downstream task generalization

Downloads 11.09k

Release Time : 10/10/2023

Model Overview

This model achieves superior performance compared to similar models under a 50B token budget through dynamic loading of the RedPajama dataset for pruning and continual pre-training

Model Features

Efficient pruning technique

Uses only 0.4B tokens for pruning, significantly reducing computational costs

Continual pre-training

Uses 50B tokens for continual pre-training of the pruned model to maintain performance

Compatibility

Shares the same vocabulary as LLaMA1 and LLaMA2, facilitating migration and usage

Model Capabilities

Text generation

Language understanding

Reasoning tasks

Reading comprehension

Knowledge-intensive task processing

Use Cases

Natural language processing

Language model benchmarking

Excellent performance on benchmarks such as ARC and HellaSwag

Average performance of 51.0, surpassing other 1.3B parameter models

Knowledge QA

Handles knowledge-intensive question answering tasks

Achieved 37.14 on TruthfulQA

🚀 Sheared-LLaMA

Sheared-LLaMA is a series of pruned and further pre-trained language models, which outperform existing large language models on various downstream tasks with a relatively small pre - training token budget.

🚀 Quick Start

Sheared-LLaMA-1.3B is a model pruned and further pre-trained from meta-llama/Llama-2-7b-hf. We dynamically load data from different domains in the RedPajama dataset to prune and contune pre-train the model. We use 0.4B tokens for pruning and 50B tokens for continued pre-training the pruned model. This model can be loaded with HuggingFace via

model = AutoModelForCausalLM.from_pretrained("princeton-nlp/Sheared-LLaMA-1.3B")

✨ Features

Smaller-scale
Same vocabulary as LLaMA1 and LLaMA2
Derived with a budget of 50B tokens by utilizing existing strong LLMs

📚 Documentation

Paper

https://arxiv.org/pdf/2310.06694.pdf

Code

https://github.com/princeton-nlp/LLM-Shearing

Models

Pruned Models without Continued Pre-training

Instruction-tuned Models

License

Must comply with license of Llama2 since it's a model derived from Llama2.

💻 Usage Examples

Downstream Tasks

We evaluate on an extensive set of downstream tasks including reasoning, reading comprehension, language modeling and knowledge intensive tasks. Our Sheared-LLaMA models outperform existing large language models.

Model	# Pre-training Tokens	Average Performance
LLaMA2-7B	2T	64.6

1.3B

Model	# Pre-training Tokens	Average Performance
OPT-1.3B	300B	48.2
Pythia-1.4B	300B	48.9
Sheared-LLaMA-1.3B	50B	51.0

Model	# Pre-training Tokens	Average Performance
OPT-2.7B	300B	51.4
Pythia-2.8B	300B	52.5
INCITE-Base-3B	800B	54.7
Open-LLaMA-3B-v1	1T	55.1
Open-LLaMA-3B-v2	1T	55.7
Sheared-LLaMA-2.7B	50B	56.7

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	31.47
ARC (25-shot)	32.85
HellaSwag (10-shot)	60.91
MMLU (5-shot)	25.71
TruthfulQA (0-shot)	37.14
Winogrande (5-shot)	58.64
GSM8K (5-shot)	0.45
DROP (3-shot)	4.56

📄 License

The project must comply with the license of Llama2 since it's a model derived from Llama2.

🔧 Technical Details

Bibtex

@article{xia2023sheared,
  title={Sheared llama: Accelerating language model pre-training via structured pruning},
  author={Xia, Mengzhou and Gao, Tianyu and Zeng, Zhiyuan and Chen, Danqi},
  journal={arXiv preprint arXiv:2310.06694},
  year={2023}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご