Meta-Llama-3-8B-Instruct-GGUF Open Source Model - Optimized for Dialogue Scenarios, Outstanding Performance in Benchmark Tests!

Meta Llama 3 8B Instruct GGUF

Developed by MaziyarPanahi

This is the GGUF quantized version of the 8 billion parameter instruction-tuned model from the Meta Llama 3 series, optimized for dialogue scenarios and demonstrating excellent performance in multiple benchmark tests.

Large Language Model English#Multi-turn Dialogue Optimization #Reinforcement Learning Tuning #Business Research Assistant

Downloads 293.90k

Release Time : 4/18/2024

Model Overview

The Llama 3 instruction-tuned model is optimized for dialogue scenarios and outperforms many open-source chat models on common industry benchmarks. Developers have particularly focused on optimizing the model's helpfulness and safety.

Model Features

Instruction Tuning Optimization

Uses supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to align with human preferences for helpfulness and safety.

Quantization Support

Provides multiple quantization versions (2-bit to 16-bit) to accommodate different hardware requirements.

Long Context Support

Supports context lengths of up to 8k tokens.

Efficient Inference

Employs grouped-query attention (GQA) mechanism to improve inference scalability.

Model Capabilities

Dialogue Generation

Text Completion

Code Generation

Question Answering Systems

Use Cases

Intelligent Assistant

Chatbot

Build intelligent conversational assistants that provide natural and smooth interaction experiences.

Outperforms many open-source models in dialogue fluency and helpfulness.

Content Generation

Creative Writing

Assists in generating creative content such as story writing and poetry composition.

🚀 MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF

This repository provides GGUF and quantized models based on the meta-llama/Meta-Llama-3-8B-Instruct model, offering efficient solutions for text generation tasks.

🚀 Quick Start

How to download

You can download only the quants you need instead of cloning the entire repository as follows:

huggingface-cli download MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF --local-dir . --include '*Q2_K*gguf'

Load GGUF models

You MUST follow the prompt template provided by Llama-3:

./llama.cpp/main -m Meta-Llama-3-8B-Instruct.Q2_K.gguf -r '<|eot_id|>' --in-prefix "\n<|start_header_id|>user<|end_header_id|>\n\n" --in-suffix "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n\nHi! How are you?<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n\n" -n 1024

✨ Features

Model Variations: Available in 8B and 70B parameter sizes, with pre-trained and instruction tuned variants.
Optimized for Dialogue: Instruction tuned models are optimized for dialogue use cases, outperforming many open source chat models on common benchmarks.
Safety and Helpfulness: Developed with a focus on optimizing helpfulness and safety.

📦 Installation

The installation process mainly involves downloading the required quantized models as described in the "How to download" section.

💻 Usage Examples

Use with transformers

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3-70B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

prompt = pipeline.tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])

Use with `llama3`

Please follow the instructions in the repository. To download Original checkpoints, see the example command below leveraging huggingface-cli:

huggingface-cli download meta-llama/Meta-Llama-3-70B-Instruct --include "original/*" --local-dir Meta-Llama-3-70B-Instruct

📚 Documentation

Model Details

Property	Details
Model Type	Meta Llama 3, an auto-regressive language model using an optimized transformer architecture.
Training Data	Pretrained on over 15 trillion tokens from publicly available sources. Fine-tuning data includes public instruction datasets and over 10M human-annotated examples.
Input	Text only.
Output	Text and code only.
Model Architecture	Optimized transformer architecture. Tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF).
Model Release Date	April 18, 2024.
Status	Static model trained on an offline dataset. Future tuned versions will be released with community feedback.
License	A custom commercial license is available at https://llama.meta.com/llama3/license.

Intended Use

Intended Use Cases: Commercial and research use in English. Instruction tuned models for assistant-like chat, pretrained models for various natural language generation tasks.
Out-of-scope: Use violating laws, regulations, Acceptable Use Policy, or Llama 3 Community License. Use in languages other than English (except with compliance).

How to use

This repository contains two versions of Meta-Llama-3-70B-Instruct, for use with transformers and with the original llama3 codebase.

Benchmarks

In this section, we report the results for Llama 3 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. For details on the methodology see here.

Base pretrained models

Category	Benchmark	Llama 3 8B	Llama2 7B	Llama2 13B	Llama 3 70B	Llama2 70B
General	MMLU (5-shot)	66.6	45.7	53.8	79.5	69.7
General	AGIEval English (3 - 5 shot)	45.9	28.8	38.7	63.0	54.8
General	CommonSenseQA (7-shot)	72.6	57.6	67.6	83.8	78.7
General	Winogrande (5-shot)	76.1	73.3	75.4	83.1	81.8
General	BIG-Bench Hard (3-shot, CoT)	61.1	38.1	47.0	81.3	65.7
General	ARC-Challenge (25-shot)	78.6	53.7	67.6	93.0	85.3
Knowledge reasoning	TriviaQA-Wiki (5-shot)	78.5	72.1	79.6	89.7	87.5
Reading comprehension	SQuAD (1-shot)	76.4	72.2	72.1	85.6	82.6
Reading comprehension	QuAC (1-shot, F1)	44.4	39.6	44.9	51.1	49.4
Reading comprehension	BoolQ (0-shot)	75.7	65.5	66.9	79.0	73.1
Reading comprehension	DROP (3-shot, F1)	58.4	37.9	49.8	79.7	70.2

Instruction tuned models

Benchmark	Llama 3 8B	Llama 2 7B	Llama 2 13B	Llama 3 70B	Llama 2 70B
MMLU (5-shot)	68.4	34.1	47.8	82.0	52.9
GPQA (0-shot)	34.2	21.7	22.3	39.5	21.0
HumanEval (0-shot)	62.2	7.9	14.0	81.7	25.6
GSM-8K (8-shot, CoT)	79.6	25.7	77.4	93.0	57.5
MATH (4-shot, CoT)	30.0	3.8	6.7	50.4	11.6

Responsibility & Safety

We believe that an open approach to AI leads to better, safer products, faster innovation, and a bigger overall market. We are committed to Responsible AI development and took a series of steps to limit misuse and harm and support the open source community.

As part of the Llama 3 release, we updated our Responsible Use Guide to outline the steps and best practices for developers to implement model and system level safety for their application.

🔧 Technical Details

Hardware and Software

Training Factors: Custom training libraries, Meta's Research SuperCluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation on third-party cloud compute.
Carbon Footprint: Pretraining utilized 7.7M GPU hours on H100 - 80GB GPUs (TDP of 700W). Estimated total emissions of 2290 tCO2eq, 100% offset by Meta’s sustainability program.

Training Data

Overview: Pretrained on over 15 trillion tokens from public sources. Fine-tuning data includes public instruction datasets and over 10M human-annotated examples. No Meta user data.
Data Freshness: Pretraining data cutoff of March 2023 for 8B and December 2023 for 70B models.

📄 License

A custom commercial license is available at https://llama.meta.com/llama3/license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご