đ Run Unsloth Llama 3.1 GGUF!
This README provides information about the Unsloth Llama 3.1 GGUF model, including its features, usage instructions, and technical details.
⨠Features
- Multilingual Support: The Meta Llama 3.1 collection supports multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
- Optimized Performance: The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
- Tool Use Support: LLaMA-3.1 supports multiple tool use formats, allowing for more advanced interactions.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
import transformers
import torch
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
outputs = pipeline(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Tool Use Example
def get_current_temperature(location: str) -> float:
"""
Get the current temperature at a location.
Args:
location: The location to get the temperature for, in the format "City, Country"
Returns:
The current temperature at the specified location in the specified units, as a float.
"""
return 22.
messages = [
{"role": "system", "content": "You are a bot that responds to weather queries."},
{"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
]
inputs = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True)
đ Documentation
Model Information
Property |
Details |
Model Type |
Meta Llama 3.1, an auto-regressive language model using an optimized transformer architecture |
Training Data |
A new mix of publicly available online data |
Model Sizes |
8B, 70B, 405B |
Input Modalities |
Multilingual Text |
Output Modalities |
Multilingual Text and code |
Context Length |
128k |
GQA |
Yes |
Token Count |
15T+ |
Knowledge Cutoff |
December 2023 |
Model Developer |
Meta |
Model Release Date |
July 23, 2024 |
Status |
Static model trained on an offline dataset |
License |
Llama 3.1 Community License, available at https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE |
Intended Use
- Intended Use Cases: Commercial and research use in multiple languages. Instruction tuned text only models are for assistant-like chat, and pretrained models can be adapted for various natural language generation tasks. The model collection also supports leveraging outputs for synthetic data generation and distillation.
- Out-of-scope: Use that violates applicable laws or regulations, or is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond the 8 supported languages without compliance.
đ§ Technical Details
- Training Factors: Custom training libraries, Meta's custom built GPU cluster, and production infrastructure were used for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure.
- Training Computation: 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware.
- Training Greenhouse Gas Emissions: Estimated total location-based greenhouse gas emissions were 11,390 tons CO2eq, with 0 tons CO2eq market-based emissions.
đ License
The model is released under the Llama 3.1 Community License, available at https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE.
Additional Information
- Blog: Read our blog about Llama 3.1 fine-tuning support: unsloth.ai/blog/llama4
- Notebooks: View the rest of our fine-tuning notebooks in our docs here.
- Export: Export your fine-tuned model to GGUF, Ollama, llama.cpp, vLLM or HF.
â ī¸ Important Note
Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner.
đĄ Usage Tip
You can find detailed recipes on how to use the model locally, with torch.compile()
, assisted generations, quantised and more at huggingface-llama-recipes