Model Overview
Model Features
Model Capabilities
Use Cases
🚀 LLaMA Model Card
This is the LLaMA-7B model converted to work with the git head of Transformers/HuggingFace on April 8, 2023. This version should resolve the EOS token issues. It's under a special license, and you can find the details in the LICENSE file.
You should only use this repository if you've been granted access to the model by filling out this form but either lost your copy of the weights or had trouble converting them to the Transformers format.
✨ Features
- Model Details: LLaMA is an auto - regressive language model developed by the FAIR team of Meta AI. It was trained between December 2022 and February 2023. The model comes in different sizes: 7B, 13B, 33B and 65B parameters.
- Intended Use: Primarily for research on large language models, including exploring applications, understanding capabilities and limitations, and evaluating and mitigating biases.
- Evaluation: Evaluated on multiple benchmarks and datasets to measure performance, biases, and toxicity.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
No code examples are provided in the original document, so this section is skipped.
📚 Documentation
Model details
Property | Details |
---|---|
Organization developing the model | The FAIR team of Meta AI |
Model date | Trained between December 2022 and February 2023 |
Model version | Version 1 |
Model type | Auto - regressive language model, based on the transformer architecture, with sizes of 7B, 13B, 33B and 65B parameters |
Paper or resources for more information | “LLaMA, Open and Efficient Foundation Language Models” at https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/ |
Citations details | https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/ |
License | Non - commercial bespoke license |
Where to send questions or comments about the model | Via the GitHub repository of the project by opening an issue |
Intended use
Primary intended uses
The primary use of LLaMA is research on large language models, including:
- Exploring potential applications such as question answering, natural language understanding or reading comprehension.
- Understanding capabilities and limitations of current language models, and developing techniques to improve those.
- Evaluating and mitigating biases, risks, toxic and harmful content generations, hallucinations.
Primary intended users
The primary intended users of the model are researchers in natural language processing, machine learning and artificial intelligence.
Out - of - scope use cases
LLaMA is a base, or foundational, model. It should not be used on downstream applications without further risk evaluation and mitigation. In particular, the model has not been trained with human feedback and can generate toxic or offensive content, incorrect information or generally unhelpful answers.
Factors
Relevant factors
One of the most relevant factors for which model performance may vary is the language used. Although 20 languages were included in the training data, most of the dataset is English text, so the model is expected to perform better for English than other languages. Performance might also vary for different dialects.
Evaluation factors
As the model is trained on web data, it is expected to reflect biases from this source. It was evaluated on RAI datasets to measure biases for gender, religion, race, sexual orientation, age, nationality, disability, physical appearance and socio - economic status. The toxicity of model generations was also measured depending on the toxicity of the context used to prompt the model.
Metrics
Model performance measures
The following measures are used to evaluate the model:
- Accuracy for common sense reasoning, reading comprehension, natural language understanding (MMLU), BIG - bench hard, WinoGender and CrowS - Pairs.
- Exact match for question answering.
- The toxicity score from Perspective API on RealToxicityPrompts.
Decision thresholds
Not applicable.
Approaches to uncertainty and variability
Due to the high computational requirements of training LLMs, only one model of each size was trained, so the variability of pre - training could not be evaluated.
Evaluation datasets
The model was evaluated on the following benchmarks: BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, ARC, OpenBookQA, NaturalQuestions, TriviaQA, RACE, MMLU, BIG - bench hard, GSM8k, RealToxicityPrompts, WinoGender, CrowS - Pairs.
Training dataset
The model was trained using the following data sources: CCNet [67%], C4 [15%], GitHub [4.5%], Wikipedia [4.5%], Books [4.5%], ArXiv [2.5%], Stack Exchange[2%]. The Wikipedia and Books domains include data in the following languages: bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk.
Quantitative analysis
Hyperparameters for the model architecture
LLaMA | dimension | n heads | n layers | Learn rate | Batch size | n tokens |
---|---|---|---|---|---|---|
7B | 4096 | 32 | 32 | 3.0E - 04 | 4M | 1T |
13B | 5120 | 40 | 40 | 3.0E - 04 | 4M | 1T |
33B | 6656 | 52 | 60 | 1.5.E - 04 | 4M | 1.4T |
65B | 8192 | 64 | 80 | 1.5.E - 04 | 4M | 1.4T |
Table 1 - Summary of LLama Model Hyperparameters
Results on common sense reasoning benchmarks
LLaMA | BoolQ | PIQA | SIQA | HellaSwag | WinoGrande | ARC - e | ARC - c | OBQA | COPA |
---|---|---|---|---|---|---|---|---|---|
7B | 76.5 | 79.8 | 48.9 | 76.1 | 70.1 | 76.7 | 47.6 | 57.2 | 93 |
13B | 78.1 | 80.1 | 50.4 | 79.2 | 73 | 78.1 | 52.7 | 56.4 | 94 |
33B | 83.1 | 82.3 | 50.4 | 82.8 | 76 | 81.4 | 57.8 | 58.6 | 92 |
65B | 85.3 | 82.8 | 52.3 | 84.2 | 77 | 81.5 | 56 | 60.2 | 94 |
Table 2 - Summary of LLama Model Performance on Reasoning tasks
Results on bias
No | Category | FAIR LLM |
---|---|---|
1 | Gender | 70.6 |
2 | Religion | 79 |
3 | Race/Color | 57 |
4 | Sexual orientation | 81 |
5 | Age | 70.1 |
6 | Nationality | 64.2 |
7 | Disability | 66.7 |
8 | Physical appearance | 77.8 |
9 | Socioeconomic status | 71.5 |
LLaMA Average | 66.6 |
Table 3 - Summary bias of our model output
Ethical considerations
Data
The training data is collected from various sources, mostly from the Web, so it contains offensive, harmful and biased content. The model is expected to exhibit such biases from the training data.
Human life
The model is not intended to inform decisions about matters central to human life and should not be used in such a way.
Mitigations
The web data was filtered based on its proximity to Wikipedia text and references using a Kneser - Ney language model and a fastText linear classifier.
Risks and harms
Risks and harms of large language models include the generation of harmful, offensive or biased content and incorrect information (hallucinations). The model is not expected to be an exception.
Use cases
LLaMA is a foundational model and should not be used for downstream applications without further investigation and mitigation of risks. These risks and potential fraught use cases include, but are not limited to, generation of misinformation and generation of harmful, biased or offensive content.
🔧 Technical Details
The data used for training the model is from various web - based sources, which may introduce biases. To mitigate this, data filtering was done using a Kneser - Ney language model and a fastText linear classifier based on the proximity to Wikipedia text and references. The model is an auto - regressive language model based on the transformer architecture, and it was trained between December 2022 and February 2023.
📄 License
The model is under a non - commercial bespoke license. Please see the LICENSE file for details.

