Llama-3.1-8B-Instruct-GGUF Open-Source Large Language Model - Optimized for Multilingual Conversations, Good Test Performance

Llama 3.1 8B Instruct GGUF

Developed by unsloth

The Meta Llama 3.1 series is a collection of multilingual large language models (LLMs), including pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. The 8B version is optimized for multilingual dialogue use cases and excels in common benchmarks.

Large Language Model Supports Multiple Languages#Multilingual Dialogue #128k Long-Context Processing #Business Assistant

Downloads 4,071

Release Time : 4/23/2025

Model Overview

Llama 3.1 is an autoregressive language model using an optimized Transformer architecture. The instruction-tuned version aligns with human preferences through supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), improving helpfulness and safety.

Model Features

Multilingual Support

Supports 8 major languages, including English, German, French, etc., suitable for international applications.

Long-Context Processing

Supports a context length of 128k tokens, ideal for handling long documents and complex conversations.

Efficient Inference

Utilizes Grouped Query Attention (GQA) mechanism to enhance inference efficiency.

Tool Usage Capability

Supports tool calling and function execution, extending model functionality.

Model Capabilities

Multilingual Text Generation

Dialogue Systems

Code Generation

Tool Calling

Long Document Processing

Use Cases

Dialogue Systems

Multilingual Customer Service Bot

Build intelligent customer service systems supporting multiple languages

Excellent performance across 8 languages

Personal Assistant

Develop chat applications similar to assistants

Optimized dialogue quality via RLHF

Content Generation

Multilingual Content Creation

Generate marketing copy, articles, etc., in multiple languages

Maintains linguistic style consistency

🚀 Run Unsloth Llama 3.1 GGUF!

This README provides information about the Unsloth Llama 3.1 GGUF model, including its features, usage instructions, and technical details.

✨ Features

Multilingual Support: The Meta Llama 3.1 collection supports multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Optimized Performance: The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
Tool Use Support: LLaMA-3.1 supports multiple tool use formats, allowing for more advanced interactions.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import transformers
import torch
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]
outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Tool Use Example

# First, define a tool
def get_current_temperature(location: str) -> float:
    """
    Get the current temperature at a location.
    
    Args:
        location: The location to get the temperature for, in the format "City, Country"
    Returns:
        The current temperature at the specified location in the specified units, as a float.
    """
    return 22.  # A real function should probably actually get the temperature!
# Next, create a chat and apply the chat template
messages = [
  {"role": "system", "content": "You are a bot that responds to weather queries."},
  {"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
]
inputs = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True)

📚 Documentation

Model Information

Property	Details
Model Type	Meta Llama 3.1, an auto-regressive language model using an optimized transformer architecture
Training Data	A new mix of publicly available online data
Model Sizes	8B, 70B, 405B
Input Modalities	Multilingual Text
Output Modalities	Multilingual Text and code
Context Length	128k
GQA	Yes
Token Count	15T+
Knowledge Cutoff	December 2023
Model Developer	Meta
Model Release Date	July 23, 2024
Status	Static model trained on an offline dataset
License	Llama 3.1 Community License, available at https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE

Intended Use

Intended Use Cases: Commercial and research use in multiple languages. Instruction tuned text only models are for assistant-like chat, and pretrained models can be adapted for various natural language generation tasks. The model collection also supports leveraging outputs for synthetic data generation and distillation.
Out-of-scope: Use that violates applicable laws or regulations, or is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond the 8 supported languages without compliance.

🔧 Technical Details

Training Factors: Custom training libraries, Meta's custom built GPU cluster, and production infrastructure were used for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure.
Training Computation: 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware.
Training Greenhouse Gas Emissions: Estimated total location-based greenhouse gas emissions were 11,390 tons CO2eq, with 0 tons CO2eq market-based emissions.

📄 License

The model is released under the Llama 3.1 Community License, available at https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE.

Additional Information

Blog: Read our blog about Llama 3.1 fine-tuning support: unsloth.ai/blog/llama4
Notebooks: View the rest of our fine-tuning notebooks in our docs here.
Export: Export your fine-tuned model to GGUF, Ollama, llama.cpp, vLLM or HF.

⚠️ Important Note

Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner.

💡 Usage Tip

You can find detailed recipes on how to use the model locally, with torch.compile(), assisted generations, quantised and more at huggingface-llama-recipes

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご