模型简介
模型特点
模型能力
使用案例
🚀 Llama-3.1-Storm-8B
Llama-3.1-Storm-8B 是基于 Llama-3.1-8B-Instruct 开发的模型,旨在提升 80 亿参数模型类的对话和函数调用能力。它在多个基准测试中显著优于 Meta AI 的 Llama-3.1-8B-Instruct 和 Hermes-3-Llama-3.1-8B 模型,适用于多种应用场景。
作者: Ashvini Kumar Jindal、Pawan Kumar Rajpoot、Ankur Parikh、Akshita Sukhlecha
🤖 Hugging Face 公告博客: https://huggingface.co/blog/akjindal53244/llama31-storm8b
🐏 Ollama: ollama run ajindal/llama3.1-storm:8b
🚀 快速开始
安装
pip install --upgrade "transformers>=4.43.2" torch==2.3.1 accelerate vllm==0.5.3.post1
对话用例
使用 🤗 Transformers
使用 transformers.pipeline()
API
import transformers
import torch
model_id = "akjindal53244/Llama-3.1-Storm-8B"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"}
]
outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.01, top_k=100, top_p=0.95)
print(outputs[0]["generated_text"][-1]) # Expected Output: {'role': 'assistant', 'content': '2 + 2 = 4'}
使用 model.generate()
API
pip install flash_attn==2.6.3
import torch
from transformers import AutoTokenizer, LlamaForCausalLM
# Apply Llama3.1 chat-template
def format_prompt(user_query):
template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"""
return template.format(user_query)
model_id = 'akjindal53244/Llama-3.1-Storm-8B'
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
load_in_8bit=False,
load_in_4bit=False,
use_flash_attention_2=True
)
# Build final input prompt after applying chat-template
prompt = format_prompt("What is 2+2?")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=128, temperature=0.01, do_sample=True, eos_token_id=tokenizer.eos_token_id)
response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response) # Expected Output: '2 + 2 = 4'
使用 vLLM
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = "akjindal53244/Llama-3.1-Storm-8B" # FP8 model: "akjindal53244/Llama-3.1-Storm-8B-FP8-Dynamic"
num_gpus = 1
tokenizer = AutoTokenizer.from_pretrained(model_id)
llm = LLM(model=model_id, tensor_parallel_size=num_gpus)
sampling_params = SamplingParams(max_tokens=128, temperature=0.01, top_k=100, top_p=0.95)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"}
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize = False)
print(llm.generate([prompt], sampling_params)[0].outputs[0].text.strip()) # Expected Output: 2 + 2 = 4
使用 LitGPT
pip install 'litgpt[all]'
litgpt download akjindal53244/Llama-3.1-Storm-8B --model_name meta-llama/Meta-Llama-3.1-8B
from litgpt import LLM
llm = LLM.load(model="akjindal53244/Llama-3.1-Storm-8B")
llm.generate("What do Llamas eat?")
函数调用用例
函数调用的提示格式
Llama-3.1-Storm-8B 针对函数调用使用了特定的系统提示进行训练:
You are a function calling AI model. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into function. The user may use the terms function calling or tool use interchangeably.
Here are the available functions:
<tools>LIST_OF_TOOLS</tools>
For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags in the format:
<tool_call>{"tool_name": <function-name>, "tool_arguments": <args-dict>}</tool_call>
上述系统提示应在传入 LIST_OF_TOOLS
作为输入时使用。
使用 vLLM
import json
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = "akjindal53244/Llama-3.1-Storm-8B" # FP8 model: "akjindal53244/Llama-3.1-Storm-8B-FP8-Dynamic"
num_gpus = 1
tokenizer = AutoTokenizer.from_pretrained(model_id)
llm = LLM(model=model_id, tensor_parallel_size=num_gpus)
sampling_params = SamplingParams(max_tokens=128, temperature=0.01, top_k=100, top_p=0.95)
def create_system_prompt(tools_list):
system_prompt_format = """You are a function calling AI model. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into function. The user may use the terms function calling or tool use interchangeably.
Here are the available functions:
<tools>{}</tools>
For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags in the format:
<tool_call>{"tool_name": <function-name>, "tool_arguments": <args-dict>}</tool_call>"""
# Convert the tools list to a string representation
tools_str = json.dumps(tools_list, ensure_ascii=False)
# Format the system prompt with the tools list
system_prompt = system_prompt_format.format(tools_str)
return system_prompt
# Example tools list
tools_list = [
{
"name": "peers",
"description": "Retrieves a list of company peers given a stock symbol.",
"parameters": {
"symbol": {
"description": "The stock symbol for the company.",
"type": "str",
"default": ""
}
}
},
{
"name": "web_chain_details",
"description": "python",
"parameters": {
"chain_slug": {
"description": "The slug identifier for the blockchain (e.g., 'ethereum' for Ethereum mainnet).",
"type": "str",
"default": "ethereum"
}
}
}
]
# Create the system prompt with the tools list
system_prompt = create_system_prompt(tools_list)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "I need to understand the details of the Ethereum blockchain for my cryptocurrency project. Can you fetch the details for 'ethereum'?"}
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize = False)
print(llm.generate([prompt], sampling_params)[0].outputs[0].text.strip()) # Expected Output: <tool_call>{'tool_name': 'web_chain_details', 'tool_arguments': {'chain_slug': 'ethereum'}}</tool_call>
使用 Ollama
import ollama
tools = [{
'type': 'function',
'function': {
'name': 'get_current_weather',
'description': 'Get the current weather for a city',
'parameters': {
'type': 'object',
'properties': {
'city': {
'type': 'string',
'description': 'The name of the city',
},
},
'required': ['city'],
},
},
},
{
'type': 'function',
'function': {
'name': 'get_places_to_vist',
'description': 'Get places to visit in a city',
'parameters': {
'type': 'object',
'properties': {
'city': {
'type': 'string',
'description': 'The name of the city',
},
},
'required': ['city'],
},
},
},
]
response = ollama.chat(
model='ajindal/llama3.1-storm:8b',
messages=[
{'role': 'system', 'content': 'Do not answer to nay vulgar questions.'},
{'role': 'user', 'content': 'What is the weather in Toronto and San Francisco?'}
],
tools=tools
)
print(response['message']) # Expected Response: {'role': 'assistant', 'content': "<tool_call>{'tool_name': 'get_current_weather', 'tool_arguments': {'city': 'Toronto'}}</tool_call>"}
✨ 主要特性
Llama-3.1-Storm-8B 模型优势
Llama-3.1-Storm-8B 是一个强大的通用模型,适用于多种应用场景。我们邀请 AI 社区探索 Llama-3.1-Storm-8B,并期待看到它在各种项目和应用中的应用。
模型优势 | 相关基准测试 |
---|---|
改进的指令遵循能力 | IFEval Strict(+3.93%) |
增强的知识驱动问答能力 | GPQA(+7.21%)、MMLU-Pro(+0.55%)、AGIEval(+3.77%) |
更好的推理能力 | ARC-C(+3.92%)、MuSR(+2.77%)、BBH(+1.67%)、AGIEval(+3.77%) |
卓越的代理能力 | BFCL:Overall Acc(+7.92%)、BFCL:AST Summary(+12.32%) |
减少的幻觉现象 | TruthfulQA(+9%) |
模型介绍
Llama-3.1-Storm-8B 基于 Llama-3.1-8B-Instruct 构建,旨在提升 80 亿参数模型类的对话和函数调用能力。
如图所示,Llama-3.1-Storm-8B 模型在多个基准测试中优于 Meta-Llama-3.1-8B-Instruct,包括指令遵循(IFEval)、知识驱动问答基准(GPQA、MMLU-Pro)、推理(ARC-C、MuSR、BBH)、真实答案生成(TruthfulQA)和函数调用(BFCL)。这一改进对于使用有限计算资源的 AI 开发者和爱好者尤为重要。
我们还将我们的模型与最近发布的基于 Llama-3.1-8B-Instruct 构建的 Hermes-3-Llama-3.1-8B 模型进行了基准测试。如图所示,Llama-3.1-Storm-8B 在 9 个基准测试中的 7 个测试中优于 Hermes-3-Llama-3.1-8B,Hermes-3-Llama-3.1-8B 在 MuSR 基准测试中超过 Llama-3.1-Storm-8B,并且两个模型在 BBH 基准测试中表现相当。
模型构建步骤
我们的方法包括三个关键步骤:
- 自我筛选:我们应用了两种自我筛选方法,从约 280 万个开源示例中选择了约 100 万个高质量示例。我们的筛选标准侧重于教育价值和难度级别,使用相同的 SLM 进行注释,而不是使用更大的模型(例如 70B、405B)。
- 有针对性的微调:我们对 Llama-3.1-8B-Instruct 模型进行了基于 Spectrum 的有针对性的微调。Spectrum 方法通过根据层模块的信噪比(SNR)选择性地针对层模块并冻结其余模块来加速训练。在我们的工作中,50% 的层被冻结。
- 模型合并:我们使用 SLERP 方法将我们的微调模型与 Llama-Spark 模型合并。合并方法产生一个混合模型,其特征从两个父模型平滑插值,确保所得模型捕捉到两个父模型的精髓。Llama-3.1-Storm-8B 在 10 个不同的基准测试中改进了 Llama-3.1-8B-Instruct。这些基准测试涵盖了指令遵循、知识驱动问答、推理、真实答案生成和函数调用等领域。
🔧 技术细节
模型类型
该模型基于 Llama-3.1-8B-Instruct 进行改进,通过自我筛选、有针对性的微调以及模型合并等步骤得到。
训练数据
从约 280 万个开源示例中筛选出约 100 万个高质量示例进行训练。
评估结果
指标 | 值 |
---|---|
平均 | 29.84 |
IFEval (0-Shot) | 80.51 |
BBH (3-Shot) | 31.49 |
MATH Lvl 5 (4-Shot) | 16.62 |
GPQA (0-shot) | 10.18 |
MuSR (0-shot) | 9.12 |
MMLU-PRO (5-shot) | 31.15 |
详细结果可查看 此处。
📄 许可证
本模型使用 llama3.1 许可证。
引用我们的工作
@misc {ashvini_kumar_jindal_2024,
author = { {Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, Akshita Sukhlecha} },
title = { Llama-3.1-Storm-8B },
year = 2024,
url = { https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B },
doi = { 10.57967/hf/2902 },
publisher = { Hugging Face }
}
支持我们的工作
我们的团队由 3 名成员组成,分布在 3 个不同的时区,我们赢得了 NeurIPS LLM Efficiency Challenge 2023 以及金融和阿拉伯语 LLM 领域的其他 4 项竞赛。我们还发布了 SOTA 数学推理模型。
Llama-3.1-Storm-8B 是我们迄今为止对开源社区最有价值的贡献。我们致力于开发高效的通用大语言模型。我们正在寻求计算资源和创新的合作伙伴,以推动这一计划向前发展。
对齐说明
虽然 Llama-3.1-Storm-8B 没有经过明确的模型对齐过程,但它可能仍然保留了一些从 Meta-Llama-3.1-8B-Instruct 模型继承的对齐属性。



