Llama-3.1-Storm-8B开源对话模型 - 免费提升对话及函数调用能力

首页

Llama 3.1 Storm 8B

由 akjindal53244 开发

Llama-3.1-Storm-8B 是基于 Llama-3.1-8B-Instruct 开发的模型，旨在提升 80 亿参数模型类的对话和函数调用能力。

大型语言模型

Transformers

支持多种语言#高效函数调用 #指令优化 #多领域问答

下载量 22.93k

发布时间 : 8/12/2024

模型简介

Llama-3.1-Storm-8B 是一个强大的通用模型，适用于多种应用场景，包括指令遵循、知识驱动问答、推理、真实答案生成和函数调用。

模型特点

改进的指令遵循能力

在 IFEval Strict 基准测试中提升 3.93%。

增强的知识驱动问答能力

在 GPQA、MMLU-Pro 和 AGIEval 基准测试中分别提升 7.21%、0.55% 和 3.77%。

更好的推理能力

在 ARC-C、MuSR、BBH 和 AGIEval 基准测试中分别提升 3.92%、2.77%、1.67% 和 3.77%。

卓越的代理能力

在 BFCL 基准测试中，Overall Acc 提升 7.92%，AST Summary 提升 12.32%。

减少的幻觉现象

在 TruthfulQA 基准测试中提升 9%。

模型能力

文本生成

函数调用

指令遵循

知识驱动问答

推理

使用案例

对话系统

对话助手

用于构建智能对话助手，回答用户问题。

能够生成准确且符合上下文的回答。

函数调用

区块链信息查询

调用函数查询区块链的详细信息。

能够正确生成函数调用请求。

🚀 Llama-3.1-Storm-8B

Llama-3.1-Storm-8B 是基于 Llama-3.1-8B-Instruct 开发的模型，旨在提升 80 亿参数模型类的对话和函数调用能力。它在多个基准测试中显著优于 Meta AI 的 Llama-3.1-8B-Instruct 和 Hermes-3-Llama-3.1-8B 模型，适用于多种应用场景。

image/jpeg

作者: Ashvini Kumar Jindal、Pawan Kumar Rajpoot、Ankur Parikh、Akshita Sukhlecha

🤖 Hugging Face 公告博客: https://huggingface.co/blog/akjindal53244/llama31-storm8b

🐏 Ollama: ollama run ajindal/llama3.1-storm:8b

🚀 快速开始

安装

pip install --upgrade "transformers>=4.43.2" torch==2.3.1 accelerate vllm==0.5.3.post1

对话用例

使用 🤗 Transformers

使用 `transformers.pipeline()` API

import transformers
import torch

model_id = "akjindal53244/Llama-3.1-Storm-8B"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"}
]

outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.01, top_k=100, top_p=0.95)
print(outputs[0]["generated_text"][-1])  # Expected Output: {'role': 'assistant', 'content': '2 + 2 = 4'}

使用 `model.generate()` API

pip install flash_attn==2.6.3

import torch
from transformers import AutoTokenizer, LlamaForCausalLM

# Apply Llama3.1 chat-template
def format_prompt(user_query):
    template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"""
    return template.format(user_query)


model_id = 'akjindal53244/Llama-3.1-Storm-8B'
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=False,
    use_flash_attention_2=True
)

# Build final input prompt after applying chat-template
prompt = format_prompt("What is 2+2?")

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=128, temperature=0.01, do_sample=True, eos_token_id=tokenizer.eos_token_id)
response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)  # Expected Output: '2 + 2 = 4'

使用 vLLM

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

model_id = "akjindal53244/Llama-3.1-Storm-8B"  # FP8 model: "akjindal53244/Llama-3.1-Storm-8B-FP8-Dynamic"
num_gpus = 1

tokenizer = AutoTokenizer.from_pretrained(model_id)
llm = LLM(model=model_id, tensor_parallel_size=num_gpus)
sampling_params = SamplingParams(max_tokens=128, temperature=0.01, top_k=100, top_p=0.95)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"}
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize = False)
print(llm.generate([prompt], sampling_params)[0].outputs[0].text.strip())  # Expected Output: 2 + 2 = 4

使用 LitGPT

pip install 'litgpt[all]'
litgpt download akjindal53244/Llama-3.1-Storm-8B --model_name meta-llama/Meta-Llama-3.1-8B

from litgpt import LLM

llm = LLM.load(model="akjindal53244/Llama-3.1-Storm-8B")
llm.generate("What do Llamas eat?")

函数调用用例

函数调用的提示格式

Llama-3.1-Storm-8B 针对函数调用使用了特定的系统提示进行训练：

You are a function calling AI model. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into function. The user may use the terms function calling or tool use interchangeably.

Here are the available functions:
<tools>LIST_OF_TOOLS</tools>

For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags in the format:
<tool_call>{"tool_name": <function-name>, "tool_arguments": <args-dict>}</tool_call>

上述系统提示应在传入 LIST_OF_TOOLS 作为输入时使用。

使用 vLLM

import json
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

model_id = "akjindal53244/Llama-3.1-Storm-8B"  # FP8 model: "akjindal53244/Llama-3.1-Storm-8B-FP8-Dynamic"
num_gpus = 1

tokenizer = AutoTokenizer.from_pretrained(model_id)
llm = LLM(model=model_id, tensor_parallel_size=num_gpus)
sampling_params = SamplingParams(max_tokens=128, temperature=0.01, top_k=100, top_p=0.95)


def create_system_prompt(tools_list):
    system_prompt_format = """You are a function calling AI model. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into function. The user may use the terms function calling or tool use interchangeably.

Here are the available functions:
<tools>{}</tools>

For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags in the format:
<tool_call>{"tool_name": <function-name>, "tool_arguments": <args-dict>}</tool_call>"""
    
    # Convert the tools list to a string representation
    tools_str = json.dumps(tools_list, ensure_ascii=False)
    # Format the system prompt with the tools list
    system_prompt = system_prompt_format.format(tools_str)
    return system_prompt


# Example tools list
tools_list = [
    {
        "name": "peers",
        "description": "Retrieves a list of company peers given a stock symbol.",
        "parameters": {
            "symbol": {
                "description": "The stock symbol for the company.",
                "type": "str",
                "default": ""
            }
        }
    },
    {
        "name": "web_chain_details",
        "description": "python",
        "parameters": {
            "chain_slug": {
                "description": "The slug identifier for the blockchain (e.g., 'ethereum' for Ethereum mainnet).",
                "type": "str",
                "default": "ethereum"
            }
        }
    }
]

# Create the system prompt with the tools list
system_prompt = create_system_prompt(tools_list)

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "I need to understand the details of the Ethereum blockchain for my cryptocurrency project. Can you fetch the details for 'ethereum'?"}
]

prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize = False)
print(llm.generate([prompt], sampling_params)[0].outputs[0].text.strip())  # Expected Output: <tool_call>{'tool_name': 'web_chain_details', 'tool_arguments': {'chain_slug': 'ethereum'}}</tool_call>

使用 Ollama

import ollama

tools = [{
      'type': 'function',
      'function': {
        'name': 'get_current_weather',
        'description': 'Get the current weather for a city',
        'parameters': {
          'type': 'object',
          'properties': {
            'city': {
              'type': 'string',
              'description': 'The name of the city',
            },
          },
          'required': ['city'],
        },
      },
    },
    {
      'type': 'function',
      'function': {
        'name': 'get_places_to_vist',
        'description': 'Get places to visit in a city',
        'parameters': {
          'type': 'object',
          'properties': {
            'city': {
              'type': 'string',
              'description': 'The name of the city',
            },
          },
          'required': ['city'],
        },
      },
    },
  ]

response = ollama.chat(
    model='ajindal/llama3.1-storm:8b',
    messages=[
        {'role': 'system', 'content': 'Do not answer to nay vulgar questions.'},
        {'role': 'user', 'content': 'What is the weather in Toronto and San Francisco?'}
        ],
    tools=tools
)

print(response['message'])  # Expected Response: {'role': 'assistant', 'content': "<tool_call>{'tool_name': 'get_current_weather', 'tool_arguments': {'city': 'Toronto'}}</tool_call>"}

✨ 主要特性

Llama-3.1-Storm-8B 模型优势

Llama-3.1-Storm-8B 是一个强大的通用模型，适用于多种应用场景。我们邀请 AI 社区探索 Llama-3.1-Storm-8B，并期待看到它在各种项目和应用中的应用。

模型优势	相关基准测试
改进的指令遵循能力	IFEval Strict（+3.93%）
增强的知识驱动问答能力	GPQA（+7.21%）、MMLU-Pro（+0.55%）、AGIEval（+3.77%）
更好的推理能力	ARC-C（+3.92%）、MuSR（+2.77%）、BBH（+1.67%）、AGIEval（+3.77%）
卓越的代理能力	BFCL：Overall Acc（+7.92%）、BFCL：AST Summary（+12.32%）
减少的幻觉现象	TruthfulQA（+9%）

模型介绍

Llama-3.1-Storm-8B 基于 Llama-3.1-8B-Instruct 构建，旨在提升 80 亿参数模型类的对话和函数调用能力。

如图所示，Llama-3.1-Storm-8B 模型在多个基准测试中优于 Meta-Llama-3.1-8B-Instruct，包括指令遵循（IFEval）、知识驱动问答基准（GPQA、MMLU-Pro）、推理（ARC-C、MuSR、BBH）、真实答案生成（TruthfulQA）和函数调用（BFCL）。这一改进对于使用有限计算资源的 AI 开发者和爱好者尤为重要。

我们还将我们的模型与最近发布的基于 Llama-3.1-8B-Instruct 构建的 Hermes-3-Llama-3.1-8B 模型进行了基准测试。如图所示，Llama-3.1-Storm-8B 在 9 个基准测试中的 7 个测试中优于 Hermes-3-Llama-3.1-8B，Hermes-3-Llama-3.1-8B 在 MuSR 基准测试中超过 Llama-3.1-Storm-8B，并且两个模型在 BBH 基准测试中表现相当。

模型构建步骤

我们的方法包括三个关键步骤：

自我筛选：我们应用了两种自我筛选方法，从约 280 万个开源示例中选择了约 100 万个高质量示例。我们的筛选标准侧重于教育价值和难度级别，使用相同的 SLM 进行注释，而不是使用更大的模型（例如 70B、405B）。
有针对性的微调：我们对 Llama-3.1-8B-Instruct 模型进行了基于 Spectrum 的有针对性的微调。Spectrum 方法通过根据层模块的信噪比（SNR）选择性地针对层模块并冻结其余模块来加速训练。在我们的工作中，50% 的层被冻结。
模型合并：我们使用 SLERP 方法将我们的微调模型与 Llama-Spark 模型合并。合并方法产生一个混合模型，其特征从两个父模型平滑插值，确保所得模型捕捉到两个父模型的精髓。Llama-3.1-Storm-8B 在 10 个不同的基准测试中改进了 Llama-3.1-8B-Instruct。这些基准测试涵盖了指令遵循、知识驱动问答、推理、真实答案生成和函数调用等领域。

🔧 技术细节

模型类型

该模型基于 Llama-3.1-8B-Instruct 进行改进，通过自我筛选、有针对性的微调以及模型合并等步骤得到。

训练数据

从约 280 万个开源示例中筛选出约 100 万个高质量示例进行训练。

评估结果

指标	值
平均	29.84
IFEval (0-Shot)	80.51
BBH (3-Shot)	31.49
MATH Lvl 5 (4-Shot)	16.62
GPQA (0-shot)	10.18
MuSR (0-shot)	9.12
MMLU-PRO (5-shot)	31.15

详细结果可查看此处。

📄 许可证

本模型使用 llama3.1 许可证。

引用我们的工作

@misc {ashvini_kumar_jindal_2024,
    author       = { {Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, Akshita Sukhlecha} },
    title        = { Llama-3.1-Storm-8B },
    year         = 2024,
    url          = { https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B },
    doi          = { 10.57967/hf/2902 },
    publisher    = { Hugging Face }
}