This repository provides GGUF and quantized models based on the meta-llama/Meta-Llama-3-8B-Instruct model, offering efficient solutions for text generation tasks.
🚀 Quick Start
How to download
You can download only the quants you need instead of cloning the entire repository as follows:
huggingface-cli download MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF --local-dir . --include '*Q2_K*gguf'
Load GGUF models
You MUST
follow the prompt template provided by Llama-3:
./llama.cpp/main -m Meta-Llama-3-8B-Instruct.Q2_K.gguf -r '<|eot_id|>' --in-prefix "\n<|start_header_id|>user<|end_header_id|>\n\n" --in-suffix "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n\nHi! How are you?<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n\n" -n 1024
✨ Features
- Model Variations: Available in 8B and 70B parameter sizes, with pre-trained and instruction tuned variants.
- Optimized for Dialogue: Instruction tuned models are optimized for dialogue use cases, outperforming many open source chat models on common benchmarks.
- Safety and Helpfulness: Developed with a focus on optimizing helpfulness and safety.
📦 Installation
The installation process mainly involves downloading the required quantized models as described in the "How to download" section.
💻 Usage Examples
Use with transformers
import transformers
import torch
model_id = "meta-llama/Meta-Llama-3-70B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda",
)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
prompt = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = pipeline(
prompt,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])
Use with llama3
Please follow the instructions in the repository.
To download Original checkpoints, see the example command below leveraging huggingface-cli
:
huggingface-cli download meta-llama/Meta-Llama-3-70B-Instruct --include "original/*" --local-dir Meta-Llama-3-70B-Instruct
📚 Documentation
Model Details
Property |
Details |
Model Type |
Meta Llama 3, an auto-regressive language model using an optimized transformer architecture. |
Training Data |
Pretrained on over 15 trillion tokens from publicly available sources. Fine-tuning data includes public instruction datasets and over 10M human-annotated examples. |
Input |
Text only. |
Output |
Text and code only. |
Model Architecture |
Optimized transformer architecture. Tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). |
Model Release Date |
April 18, 2024. |
Status |
Static model trained on an offline dataset. Future tuned versions will be released with community feedback. |
License |
A custom commercial license is available at https://llama.meta.com/llama3/license. |
Intended Use
- Intended Use Cases: Commercial and research use in English. Instruction tuned models for assistant-like chat, pretrained models for various natural language generation tasks.
- Out-of-scope: Use violating laws, regulations, Acceptable Use Policy, or Llama 3 Community License. Use in languages other than English (except with compliance).
How to use
This repository contains two versions of Meta-Llama-3-70B-Instruct, for use with transformers and with the original llama3
codebase.
Benchmarks
In this section, we report the results for Llama 3 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. For details on the methodology see here.
Base pretrained models
Category |
Benchmark |
Llama 3 8B |
Llama2 7B |
Llama2 13B |
Llama 3 70B |
Llama2 70B |
General |
MMLU (5-shot) |
66.6 |
45.7 |
53.8 |
79.5 |
69.7 |
General |
AGIEval English (3 - 5 shot) |
45.9 |
28.8 |
38.7 |
63.0 |
54.8 |
General |
CommonSenseQA (7-shot) |
72.6 |
57.6 |
67.6 |
83.8 |
78.7 |
General |
Winogrande (5-shot) |
76.1 |
73.3 |
75.4 |
83.1 |
81.8 |
General |
BIG-Bench Hard (3-shot, CoT) |
61.1 |
38.1 |
47.0 |
81.3 |
65.7 |
General |
ARC-Challenge (25-shot) |
78.6 |
53.7 |
67.6 |
93.0 |
85.3 |
Knowledge reasoning |
TriviaQA-Wiki (5-shot) |
78.5 |
72.1 |
79.6 |
89.7 |
87.5 |
Reading comprehension |
SQuAD (1-shot) |
76.4 |
72.2 |
72.1 |
85.6 |
82.6 |
Reading comprehension |
QuAC (1-shot, F1) |
44.4 |
39.6 |
44.9 |
51.1 |
49.4 |
Reading comprehension |
BoolQ (0-shot) |
75.7 |
65.5 |
66.9 |
79.0 |
73.1 |
Reading comprehension |
DROP (3-shot, F1) |
58.4 |
37.9 |
49.8 |
79.7 |
70.2 |
Instruction tuned models
Benchmark |
Llama 3 8B |
Llama 2 7B |
Llama 2 13B |
Llama 3 70B |
Llama 2 70B |
MMLU (5-shot) |
68.4 |
34.1 |
47.8 |
82.0 |
52.9 |
GPQA (0-shot) |
34.2 |
21.7 |
22.3 |
39.5 |
21.0 |
HumanEval (0-shot) |
62.2 |
7.9 |
14.0 |
81.7 |
25.6 |
GSM-8K (8-shot, CoT) |
79.6 |
25.7 |
77.4 |
93.0 |
57.5 |
MATH (4-shot, CoT) |
30.0 |
3.8 |
6.7 |
50.4 |
11.6 |
Responsibility & Safety
We believe that an open approach to AI leads to better, safer products, faster innovation, and a bigger overall market. We are committed to Responsible AI development and took a series of steps to limit misuse and harm and support the open source community.
As part of the Llama 3 release, we updated our Responsible Use Guide to outline the steps and best practices for developers to implement model and system level safety for their application.
🔧 Technical Details
Hardware and Software
- Training Factors: Custom training libraries, Meta's Research SuperCluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation on third-party cloud compute.
- Carbon Footprint: Pretraining utilized 7.7M GPU hours on H100 - 80GB GPUs (TDP of 700W). Estimated total emissions of 2290 tCO2eq, 100% offset by Meta’s sustainability program.
Training Data
- Overview: Pretrained on over 15 trillion tokens from public sources. Fine-tuning data includes public instruction datasets and over 10M human-annotated examples. No Meta user data.
- Data Freshness: Pretraining data cutoff of March 2023 for 8B and December 2023 for 70B models.
📄 License
A custom commercial license is available at https://llama.meta.com/llama3/license.