Llama-3.1-Storm-8B開源對話模型 - 免費提升對話及函數調用能力

首頁

Llama 3.1 Storm 8B

由akjindal53244開發

Llama-3.1-Storm-8B 是基於 Llama-3.1-8B-Instruct 開發的模型，旨在提升 80 億參數模型類的對話和函數調用能力。

大型語言模型

Transformers

支持多種語言#高效函數調用 #指令優化 #多領域問答

下載量 22.93k

發布時間 : 8/12/2024

模型概述

Llama-3.1-Storm-8B 是一個強大的通用模型，適用於多種應用場景，包括指令遵循、知識驅動問答、推理、真實答案生成和函數調用。

模型特點

改進的指令遵循能力

在 IFEval Strict 基準測試中提升 3.93%。

增強的知識驅動問答能力

在 GPQA、MMLU-Pro 和 AGIEval 基準測試中分別提升 7.21%、0.55% 和 3.77%。

更好的推理能力

在 ARC-C、MuSR、BBH 和 AGIEval 基準測試中分別提升 3.92%、2.77%、1.67% 和 3.77%。

卓越的代理能力

在 BFCL 基準測試中，Overall Acc 提升 7.92%，AST Summary 提升 12.32%。

減少的幻覺現象

在 TruthfulQA 基準測試中提升 9%。

模型能力

文本生成

函數調用

指令遵循

知識驅動問答

推理

使用案例

對話系統

對話助手

用於構建智能對話助手，回答用戶問題。

能夠生成準確且符合上下文的回答。

函數調用

區塊鏈信息查詢

調用函數查詢區塊鏈的詳細信息。

能夠正確生成函數調用請求。

🚀 Llama-3.1-Storm-8B

Llama-3.1-Storm-8B 是基於 Llama-3.1-8B-Instruct 開發的模型，旨在提升 80 億參數模型類的對話和函數調用能力。它在多個基準測試中顯著優於 Meta AI 的 Llama-3.1-8B-Instruct 和 Hermes-3-Llama-3.1-8B 模型，適用於多種應用場景。

image/jpeg

作者: Ashvini Kumar Jindal、Pawan Kumar Rajpoot、Ankur Parikh、Akshita Sukhlecha

🤖 Hugging Face 公告博客: https://huggingface.co/blog/akjindal53244/llama31-storm8b

🐏 Ollama: ollama run ajindal/llama3.1-storm:8b

🚀 快速開始

安裝

pip install --upgrade "transformers>=4.43.2" torch==2.3.1 accelerate vllm==0.5.3.post1

對話用例

使用 🤗 Transformers

使用 `transformers.pipeline()` API

import transformers
import torch

model_id = "akjindal53244/Llama-3.1-Storm-8B"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"}
]

outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.01, top_k=100, top_p=0.95)
print(outputs[0]["generated_text"][-1])  # Expected Output: {'role': 'assistant', 'content': '2 + 2 = 4'}

使用 `model.generate()` API

pip install flash_attn==2.6.3

import torch
from transformers import AutoTokenizer, LlamaForCausalLM

# Apply Llama3.1 chat-template
def format_prompt(user_query):
    template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"""
    return template.format(user_query)


model_id = 'akjindal53244/Llama-3.1-Storm-8B'
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=False,
    use_flash_attention_2=True
)

# Build final input prompt after applying chat-template
prompt = format_prompt("What is 2+2?")

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=128, temperature=0.01, do_sample=True, eos_token_id=tokenizer.eos_token_id)
response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)  # Expected Output: '2 + 2 = 4'

使用 vLLM

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

model_id = "akjindal53244/Llama-3.1-Storm-8B"  # FP8 model: "akjindal53244/Llama-3.1-Storm-8B-FP8-Dynamic"
num_gpus = 1

tokenizer = AutoTokenizer.from_pretrained(model_id)
llm = LLM(model=model_id, tensor_parallel_size=num_gpus)
sampling_params = SamplingParams(max_tokens=128, temperature=0.01, top_k=100, top_p=0.95)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"}
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize = False)
print(llm.generate([prompt], sampling_params)[0].outputs[0].text.strip())  # Expected Output: 2 + 2 = 4

使用 LitGPT

pip install 'litgpt[all]'
litgpt download akjindal53244/Llama-3.1-Storm-8B --model_name meta-llama/Meta-Llama-3.1-8B

from litgpt import LLM

llm = LLM.load(model="akjindal53244/Llama-3.1-Storm-8B")
llm.generate("What do Llamas eat?")

函數調用用例

函數調用的提示格式

Llama-3.1-Storm-8B 針對函數調用使用了特定的系統提示進行訓練：

You are a function calling AI model. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into function. The user may use the terms function calling or tool use interchangeably.

Here are the available functions:
<tools>LIST_OF_TOOLS</tools>

For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags in the format:
<tool_call>{"tool_name": <function-name>, "tool_arguments": <args-dict>}</tool_call>

上述系統提示應在傳入 LIST_OF_TOOLS 作為輸入時使用。

使用 vLLM

import json
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

model_id = "akjindal53244/Llama-3.1-Storm-8B"  # FP8 model: "akjindal53244/Llama-3.1-Storm-8B-FP8-Dynamic"
num_gpus = 1

tokenizer = AutoTokenizer.from_pretrained(model_id)
llm = LLM(model=model_id, tensor_parallel_size=num_gpus)
sampling_params = SamplingParams(max_tokens=128, temperature=0.01, top_k=100, top_p=0.95)


def create_system_prompt(tools_list):
    system_prompt_format = """You are a function calling AI model. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into function. The user may use the terms function calling or tool use interchangeably.

Here are the available functions:
<tools>{}</tools>

For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags in the format:
<tool_call>{"tool_name": <function-name>, "tool_arguments": <args-dict>}</tool_call>"""
    
    # Convert the tools list to a string representation
    tools_str = json.dumps(tools_list, ensure_ascii=False)
    # Format the system prompt with the tools list
    system_prompt = system_prompt_format.format(tools_str)
    return system_prompt


# Example tools list
tools_list = [
    {
        "name": "peers",
        "description": "Retrieves a list of company peers given a stock symbol.",
        "parameters": {
            "symbol": {
                "description": "The stock symbol for the company.",
                "type": "str",
                "default": ""
            }
        }
    },
    {
        "name": "web_chain_details",
        "description": "python",
        "parameters": {
            "chain_slug": {
                "description": "The slug identifier for the blockchain (e.g., 'ethereum' for Ethereum mainnet).",
                "type": "str",
                "default": "ethereum"
            }
        }
    }
]

# Create the system prompt with the tools list
system_prompt = create_system_prompt(tools_list)

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "I need to understand the details of the Ethereum blockchain for my cryptocurrency project. Can you fetch the details for 'ethereum'?"}
]

prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize = False)
print(llm.generate([prompt], sampling_params)[0].outputs[0].text.strip())  # Expected Output: <tool_call>{'tool_name': 'web_chain_details', 'tool_arguments': {'chain_slug': 'ethereum'}}</tool_call>

使用 Ollama

import ollama

tools = [{
      'type': 'function',
      'function': {
        'name': 'get_current_weather',
        'description': 'Get the current weather for a city',
        'parameters': {
          'type': 'object',
          'properties': {
            'city': {
              'type': 'string',
              'description': 'The name of the city',
            },
          },
          'required': ['city'],
        },
      },
    },
    {
      'type': 'function',
      'function': {
        'name': 'get_places_to_vist',
        'description': 'Get places to visit in a city',
        'parameters': {
          'type': 'object',
          'properties': {
            'city': {
              'type': 'string',
              'description': 'The name of the city',
            },
          },
          'required': ['city'],
        },
      },
    },
  ]

response = ollama.chat(
    model='ajindal/llama3.1-storm:8b',
    messages=[
        {'role': 'system', 'content': 'Do not answer to nay vulgar questions.'},
        {'role': 'user', 'content': 'What is the weather in Toronto and San Francisco?'}
        ],
    tools=tools
)

print(response['message'])  # Expected Response: {'role': 'assistant', 'content': "<tool_call>{'tool_name': 'get_current_weather', 'tool_arguments': {'city': 'Toronto'}}</tool_call>"}

✨ 主要特性

Llama-3.1-Storm-8B 模型優勢

Llama-3.1-Storm-8B 是一個強大的通用模型，適用於多種應用場景。我們邀請 AI 社區探索 Llama-3.1-Storm-8B，並期待看到它在各種項目和應用中的應用。

模型優勢	相關基準測試
改進的指令遵循能力	IFEval Strict（+3.93%）
增強的知識驅動問答能力	GPQA（+7.21%）、MMLU-Pro（+0.55%）、AGIEval（+3.77%）
更好的推理能力	ARC-C（+3.92%）、MuSR（+2.77%）、BBH（+1.67%）、AGIEval（+3.77%）
卓越的代理能力	BFCL：Overall Acc（+7.92%）、BFCL：AST Summary（+12.32%）
減少的幻覺現象	TruthfulQA（+9%）

模型介紹

Llama-3.1-Storm-8B 基於 Llama-3.1-8B-Instruct 構建，旨在提升 80 億參數模型類的對話和函數調用能力。

如圖所示，Llama-3.1-Storm-8B 模型在多個基準測試中優於 Meta-Llama-3.1-8B-Instruct，包括指令遵循（IFEval）、知識驅動問答基準（GPQA、MMLU-Pro）、推理（ARC-C、MuSR、BBH）、真實答案生成（TruthfulQA）和函數調用（BFCL）。這一改進對於使用有限計算資源的 AI 開發者和愛好者尤為重要。

我們還將我們的模型與最近發佈的基於 Llama-3.1-8B-Instruct 構建的 Hermes-3-Llama-3.1-8B 模型進行了基準測試。如圖所示，Llama-3.1-Storm-8B 在 9 個基準測試中的 7 個測試中優於 Hermes-3-Llama-3.1-8B，Hermes-3-Llama-3.1-8B 在 MuSR 基準測試中超過 Llama-3.1-Storm-8B，並且兩個模型在 BBH 基準測試中表現相當。

模型構建步驟

我們的方法包括三個關鍵步驟：

自我篩選：我們應用了兩種自我篩選方法，從約 280 萬個開源示例中選擇了約 100 萬個高質量示例。我們的篩選標準側重於教育價值和難度級別，使用相同的 SLM 進行註釋，而不是使用更大的模型（例如 70B、405B）。
有針對性的微調：我們對 Llama-3.1-8B-Instruct 模型進行了基於 Spectrum 的有針對性的微調。Spectrum 方法通過根據層模塊的信噪比（SNR）選擇性地針對層模塊並凍結其餘模塊來加速訓練。在我們的工作中，50% 的層被凍結。
模型合併：我們使用 SLERP 方法將我們的微調模型與 Llama-Spark 模型合併。合併方法產生一個混合模型，其特徵從兩個父模型平滑插值，確保所得模型捕捉到兩個父模型的精髓。Llama-3.1-Storm-8B 在 10 個不同的基準測試中改進了 Llama-3.1-8B-Instruct。這些基準測試涵蓋了指令遵循、知識驅動問答、推理、真實答案生成和函數調用等領域。

🔧 技術細節

模型類型

該模型基於 Llama-3.1-8B-Instruct 進行改進，通過自我篩選、有針對性的微調以及模型合併等步驟得到。

訓練數據

從約 280 萬個開源示例中篩選出約 100 萬個高質量示例進行訓練。

評估結果

指標	值
平均	29.84
IFEval (0-Shot)	80.51
BBH (3-Shot)	31.49
MATH Lvl 5 (4-Shot)	16.62
GPQA (0-shot)	10.18
MuSR (0-shot)	9.12
MMLU-PRO (5-shot)	31.15

詳細結果可查看此處。

📄 許可證

本模型使用 llama3.1 許可證。

引用我們的工作

@misc {ashvini_kumar_jindal_2024,
    author       = { {Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, Akshita Sukhlecha} },
    title        = { Llama-3.1-Storm-8B },
    year         = 2024,
    url          = { https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B },
    doi          = { 10.57967/hf/2902 },
    publisher    = { Hugging Face }
}