Llama-3.1-Storm-8B-GGUF Open Source Model - Supports Conversations and Function Calls, Performs Well and Free to Use

Llama 3.1 Storm 8B GGUF

Developed by akjindal53244

Llama-3.1-Storm-8B is an improved model based on Llama-3.1-8B-Instruct, demonstrating excellent performance in multiple benchmarks and is suitable for dialogue and function calling tasks.

Large Language Model Supports Multiple Languages#Instruction Optimization #Function Calling #Multilingual Reasoning

Downloads 654

Release Time : 8/16/2024

Model Overview

Llama-3.1-Storm-8B is a powerful general-purpose language model that significantly enhances performance in instruction following, knowledge-based Q&A, reasoning, hallucination reduction, and function calling through self-filtered data, targeted fine-tuning, and model fusion techniques.

Model Features

Improved Instruction Following

Performance improved by 3.93% in IFEval strict mode

Enhanced Knowledge-Driven Q&A

Excellent performance in benchmarks like GPQA, MMLU-Pro, and AGIEval

Better Reasoning Capabilities

Significant performance improvements in reasoning benchmarks such as ARC-C, MuSR, and BBH

Superior Agent Capabilities

Accuracy improved by 7.92% in function calling tasks

Reduced Hallucination

Performance improved by 9% in the TruthfulQA benchmark

Model Capabilities

Text Generation

Instruction Following

Knowledge Q&A

Logical Reasoning

Function Calling

Multilingual Support

Use Cases

Dialogue Systems

Intelligent Customer Service

Used to build customer service systems capable of understanding complex instructions and providing accurate responses

Excellent performance in instruction following and knowledge Q&A

Function Calling

Weather Query

Capable of correctly calling weather query functions and returning results

Accuracy improved by 7.92% in the BFCL benchmark

Educational Applications

Knowledge Q&A

Used to build Q&A systems in the education field

Excellent performance in knowledge-driven benchmarks like GPQA

🚀 Llama-3.1-Storm-8B-GGUF

This is the GGUF quantized version of Llama-3.1-Storm-8B, designed for use with llama.cpp. It offers enhanced performance and compatibility for efficient text generation.

image/jpeg

Authors: Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, Akshita Sukhlecha

Hugging Face Announcement Blog: https://huggingface.co/blog/akjindal53244/llama31-storm8b

Ollama: ollama run ajindal/llama3.1-storm:8b

🚀 Quick Start

This section provides a brief overview of the Llama-3.1-Storm-8B-GGUF model and its capabilities.

image/png

We introduce the Llama-3.1-Storm-8B model, which significantly outperforms Meta AI's Llama-3.1-8B-Instruct and Hermes-3-Llama-3.1-8B across various benchmarks. Our approach involves three key steps:

Self-Curation: We selected approximately 1 million high-quality examples from a pool of ~2.8 million open-source examples using two self-curation methods. Our criteria focused on educational value and difficulty level, using the same SLM for annotation instead of larger models.
Targeted fine-tuning: We performed Spectrum-based targeted fine-tuning on the Llama-3.1-8B-Instruct model. The Spectrum method accelerates training by selectively targeting layer modules based on their signal-to-noise ratio (SNR) and freezing the remaining modules. In our work, 50% of the layers are frozen.
Model Merging: We merged our fine-tuned model with the Llama-Spark model using the SLERP method. This produces a blended model with characteristics smoothly interpolated from both parent models, ensuring the resultant model captures the essence of both.

✨ Features

Introducing Llama-3.1-Storm-8B

Llama-3.1-Storm-8B builds on the foundation of Llama-3.1-8B-Instruct, aiming to enhance conversational and function calling capabilities within the 8B parameter model class.

As shown in the left subplot of the figure, Llama-3.1-Storm-8B improves Meta-Llama-3.1-8B-Instruct across multiple benchmarks, including Instruction-following (IFEval), Knowledge-driven QA benchmarks (GPQA, MMLU-Pro), Reasoning (ARC-C, MuSR, BBH), Reduced Hallucinations (TruthfulQA), and Function-Calling (BFCL). This improvement is particularly beneficial for AI developers and enthusiasts with limited computational resources.

We also benchmarked our model against the recently published Hermes-3-Llama-3.1-8B. As shown in the right subplot, Llama-3.1-Storm-8B outperforms Hermes-3-Llama-3.1-8B in 7 out of 9 benchmarks, with Hermes-3-Llama-3.1-8B surpassing Llama-3.1-Storm-8B on the MuSR benchmark and both models showing comparable performance on the BBH benchmark.

Llama-3.1-Storm-8B Model Strengths

Llama-3.1-Storm-8B is a powerful generalist model suitable for a wide range of applications. We encourage the AI community to explore Llama-3.1-Storm-8B and discover its potential in various projects.

Model Strength	Relevant Benchmarks
Improved Instruction Following	IFEval Strict (+3.93%)
Enhanced Knowledge Driven Question Answering	GPQA (+7.21%), MMLU-Pro (+0.55%), AGIEval (+3.77%)
Better Reasoning	ARC-C (+3.92%), MuSR (+2.77%), BBH (+1.67%), AGIEval (+3.77%)
Superior Agentic Capabilities	BFCL: Overall Acc (+7.92%), BFCL: AST Summary (+12.32%)
Reduced Hallucinations	TruthfulQA (+9%)

Note: All improvements are absolute gains over Meta-Llama-3.1-8B-Instruct.

Llama-3.1-Storm-8B Models

BF16: Llama-3.1-Storm-8B
FP8: Llama-3.1-Storm-8B-FP8-Dynamic
GGUF: Llama-3.1-Storm-8B-GGUF
Ollama: ollama run ajindal/llama3.1-storm:8b

📦 Installation

pip install llama-cpp-python

💻 Usage Examples

Basic Usage

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Download the GGUF model
model_name = "akjindal53244/Llama-3.1-Storm-8B-GGUF"
model_file = "Llama-3.1-Storm-8B.Q8_0.gguf" # this is the specific model file we'll use in this example. It's a 4-bit quant, but other levels of quantization are available in the model repo if preferred
model_path = hf_hub_download(model_name, filename=model_file)

## Instantiate model from downloaded file
llm = Llama(
    model_path=model_path,
    n_ctx=16000,    # Context length to use
    n_threads=32,   # Number of CPU threads to use
    n_gpu_layers=0  # Number of model layers to offload to GPU
)

generation_kwargs = {
    "max_tokens":200,
    "stop":["<|eot_id|>"],
    "echo":False, # Echo the prompt in the output
    "top_k":1 # Set this value > 1 for sampling decoding
}

prompt = "What is 2+2?"
res = llm(prompt, **generation_kwargs)
print(res["choices"][0]["text"])

Advanced Usage - Function Calling Example with Ollama

import ollama
tools = [{
      'type': 'function',
      'function': {
        'name': 'get_current_weather',
        'description': 'Get the current weather for a city',
        'parameters': {
          'type': 'object',
          'properties': {
            'city': {
              'type': 'string',
              'description': 'The name of the city',
            },
          },
          'required': ['city'],
        },
      },
    },
    {
      'type': 'function',
      'function': {
        'name': 'get_places_to_vist',
        'description': 'Get places to visit in a city',
        'parameters': {
          'type': 'object',
          'properties': {
            'city': {
              'type': 'string',
              'description': 'The name of the city',
            },
          },
          'required': ['city'],
        },
      },
    },
  ]
response = ollama.chat(
    model='ajindal/llama3.1-storm:8b',
    messages=[
        {'role': 'system', 'content': 'Do not answer to nay vulgar questions.'},
        {'role': 'user', 'content': 'What is the weather in Toronto and San Francisco?'}
        ],
    tools=tools
)
print(response['message'])  # Expected Response: {'role': 'assistant', 'content': "<tool_call>{'tool_name': 'get_current_weather', 'tool_arguments': {'city': 'Toronto'}}</tool_call>"}

📚 Documentation

Alignment Note

While Llama-3.1-Storm-8B did not undergo an explicit model alignment process, it may still retain some alignment properties inherited from the Meta-Llama-3.1-8B-Instruct model.

Cite Our Work

@misc {ashvini_kumar_jindal_2024,
    author       = { {Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, Akshita Sukhlecha} },
    title        = { Llama-3.1-Storm-8B },
    year         = 2024,
    url          = { https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B },
    doi          = { 10.57967/hf/2902 },
    publisher    = { Hugging Face }
}

Support Our Work

With 3 team-members spanning 3 different time-zones, we have won NeurIPS LLM Efficiency Challenge 2023 and 4 other competitions in the Finance and Arabic LLM space. We have also published SOTA mathematical reasoning model.

Llama-3.1-Storm-8B is our most valuable contribution to the open-source community so far. We are committed to developing efficient generalist LLMs. We're seeking both computational resources and innovative collaborators to drive this initiative forward.

📄 License

The model is released under the llama3.1 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご