đ Llama-3.1-Storm-8B-GGUF
This is the GGUF quantized version of Llama-3.1-Storm-8B, designed for use with llama.cpp. It offers enhanced performance and compatibility for efficient text generation.

Authors: Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, Akshita Sukhlecha
Hugging Face Announcement Blog: https://huggingface.co/blog/akjindal53244/llama31-storm8b
Ollama: ollama run ajindal/llama3.1-storm:8b
đ Quick Start
This section provides a brief overview of the Llama-3.1-Storm-8B-GGUF model and its capabilities.

We introduce the Llama-3.1-Storm-8B model, which significantly outperforms Meta AI's Llama-3.1-8B-Instruct and Hermes-3-Llama-3.1-8B across various benchmarks. Our approach involves three key steps:
- Self-Curation: We selected approximately 1 million high-quality examples from a pool of ~2.8 million open-source examples using two self-curation methods. Our criteria focused on educational value and difficulty level, using the same SLM for annotation instead of larger models.
- Targeted fine-tuning: We performed Spectrum-based targeted fine-tuning on the Llama-3.1-8B-Instruct model. The Spectrum method accelerates training by selectively targeting layer modules based on their signal-to-noise ratio (SNR) and freezing the remaining modules. In our work, 50% of the layers are frozen.
- Model Merging: We merged our fine-tuned model with the Llama-Spark model using the SLERP method. This produces a blended model with characteristics smoothly interpolated from both parent models, ensuring the resultant model captures the essence of both.
⨠Features
Introducing Llama-3.1-Storm-8B
Llama-3.1-Storm-8B builds on the foundation of Llama-3.1-8B-Instruct, aiming to enhance conversational and function calling capabilities within the 8B parameter model class.
As shown in the left subplot of the figure, Llama-3.1-Storm-8B improves Meta-Llama-3.1-8B-Instruct across multiple benchmarks, including Instruction-following (IFEval), Knowledge-driven QA benchmarks (GPQA, MMLU-Pro), Reasoning (ARC-C, MuSR, BBH), Reduced Hallucinations (TruthfulQA), and Function-Calling (BFCL). This improvement is particularly beneficial for AI developers and enthusiasts with limited computational resources.
We also benchmarked our model against the recently published Hermes-3-Llama-3.1-8B. As shown in the right subplot, Llama-3.1-Storm-8B outperforms Hermes-3-Llama-3.1-8B in 7 out of 9 benchmarks, with Hermes-3-Llama-3.1-8B surpassing Llama-3.1-Storm-8B on the MuSR benchmark and both models showing comparable performance on the BBH benchmark.
Llama-3.1-Storm-8B Model Strengths
Llama-3.1-Storm-8B is a powerful generalist model suitable for a wide range of applications. We encourage the AI community to explore Llama-3.1-Storm-8B and discover its potential in various projects.
Model Strength |
Relevant Benchmarks |
Improved Instruction Following |
IFEval Strict (+3.93%) |
Enhanced Knowledge Driven Question Answering |
GPQA (+7.21%), MMLU-Pro (+0.55%), AGIEval (+3.77%) |
Better Reasoning |
ARC-C (+3.92%), MuSR (+2.77%), BBH (+1.67%), AGIEval (+3.77%) |
Superior Agentic Capabilities |
BFCL: Overall Acc (+7.92%), BFCL: AST Summary (+12.32%) |
Reduced Hallucinations |
TruthfulQA (+9%) |
Note: All improvements are absolute gains over Meta-Llama-3.1-8B-Instruct.
Llama-3.1-Storm-8B Models
BF16
: Llama-3.1-Storm-8B
FP8
: Llama-3.1-Storm-8B-FP8-Dynamic
GGUF
: Llama-3.1-Storm-8B-GGUF
- Ollama:
ollama run ajindal/llama3.1-storm:8b
đĻ Installation
pip install llama-cpp-python
đģ Usage Examples
Basic Usage
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
model_name = "akjindal53244/Llama-3.1-Storm-8B-GGUF"
model_file = "Llama-3.1-Storm-8B.Q8_0.gguf"
model_path = hf_hub_download(model_name, filename=model_file)
llm = Llama(
model_path=model_path,
n_ctx=16000,
n_threads=32,
n_gpu_layers=0
)
generation_kwargs = {
"max_tokens":200,
"stop":["<|eot_id|>"],
"echo":False,
"top_k":1
}
prompt = "What is 2+2?"
res = llm(prompt, **generation_kwargs)
print(res["choices"][0]["text"])
Advanced Usage - Function Calling Example with Ollama
import ollama
tools = [{
'type': 'function',
'function': {
'name': 'get_current_weather',
'description': 'Get the current weather for a city',
'parameters': {
'type': 'object',
'properties': {
'city': {
'type': 'string',
'description': 'The name of the city',
},
},
'required': ['city'],
},
},
},
{
'type': 'function',
'function': {
'name': 'get_places_to_vist',
'description': 'Get places to visit in a city',
'parameters': {
'type': 'object',
'properties': {
'city': {
'type': 'string',
'description': 'The name of the city',
},
},
'required': ['city'],
},
},
},
]
response = ollama.chat(
model='ajindal/llama3.1-storm:8b',
messages=[
{'role': 'system', 'content': 'Do not answer to nay vulgar questions.'},
{'role': 'user', 'content': 'What is the weather in Toronto and San Francisco?'}
],
tools=tools
)
print(response['message'])
đ Documentation
Alignment Note
While Llama-3.1-Storm-8B did not undergo an explicit model alignment process, it may still retain some alignment properties inherited from the Meta-Llama-3.1-8B-Instruct model.
Cite Our Work
@misc {ashvini_kumar_jindal_2024,
author = { {Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, Akshita Sukhlecha} },
title = { Llama-3.1-Storm-8B },
year = 2024,
url = { https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B },
doi = { 10.57967/hf/2902 },
publisher = { Hugging Face }
}
Support Our Work
With 3 team-members spanning 3 different time-zones, we have won NeurIPS LLM Efficiency Challenge 2023 and 4 other competitions in the Finance and Arabic LLM space. We have also published SOTA mathematical reasoning model.
Llama-3.1-Storm-8B is our most valuable contribution to the open-source community so far. We are committed to developing efficient generalist LLMs. We're seeking both computational resources and innovative collaborators to drive this initiative forward.
đ License
The model is released under the llama3.1 license.