模型简介
模型特点
模型能力
使用案例
🚀 xLAM大动作模型系列
xLAM模型系列属于大型动作模型(LAMs),旨在强化决策能力,将用户意图转化为与现实世界交互的可执行操作。这些模型能够自主规划并执行任务以达成特定目标,是AI智能体的核心,具有自动化各领域工作流程的潜力,在众多应用场景中价值巨大。
本次模型发布仅用于研究目的。全新且更强大的xLAM版本即将仅面向我们平台的客户推出。
[主页] | [论文] | [Github] | [Discord] | [博客] | [社区演示]
📋 目录
📦 模型系列
我们提供了不同规模的xLAM模型系列,以满足各种应用需求,包括针对函数调用和通用智能体应用优化的模型:
模型名称 | 总参数数量 | 上下文长度 | 发布日期 | 类别 | 模型下载链接 | GGUF文件下载链接 |
---|---|---|---|---|---|---|
xLAM-7b-r | 72.4亿 | 32k | 2024年9月5日 | 通用、函数调用 | 🤗 链接 | -- |
xLAM-8x7b-r | 467亿 | 32k | 2024年9月5日 | 通用、函数调用 | 🤗 链接 | -- |
xLAM-8x22b-r | 1410亿 | 64k | 2024年9月5日 | 通用、函数调用 | 🤗 链接 | -- |
xLAM-1b-fc-r | 13.5亿 | 16k | 2024年7月17日 | 函数调用 | 🤗 链接 | 🤗 链接 |
xLAM-7b-fc-r | 69.1亿 | 4k | 2024年7月17日 | 函数调用 | 🤗 链接 | 🤗 链接 |
xLAM-v0.1-r | 467亿 | 32k | 2024年3月18日 | 通用、函数调用 | 🤗 链接 | -- |
对于我们的函数调用系列模型(更多细节见此处),我们还提供了量化后的GGUF文件,以实现高效部署和执行。GGUF是一种专门设计用于高效存储和加载大语言模型的文件格式,非常适合在资源有限的本地设备上运行AI模型,支持离线功能并增强隐私保护。
🔍 查看与xLAM交互的最新示例
这里有与xLAM模型交互的最新示例和分词器。
📂 仓库概述
本仓库主要涉及通用工具使用系列。如需更专业的函数调用模型,请查看我们的fc
系列此处。
本指南将指导您完成模型系列在HuggingFace上的设置、使用和集成。
框架版本
- Transformers 4.41.0
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
💻 使用方法
使用Huggingface的基本用法
要从Huggingface使用该模型,请先安装transformers
库:
pip install transformers>=4.41.0
请注意,我们的模型在使用我们提供的提示格式时效果最佳。这种格式使我们能够提取类似于ChatGPT函数调用模式的JSON输出。
我们使用以下示例来说明如何使用我们的模型进行1) 单轮使用场景和2) 多轮使用场景。
1. 单轮使用场景
import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
torch.random.manual_seed(0)
model_name = "Salesforce/xLAM-7b-r"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Please use our provided instruction prompt for best performance
task_instruction = """
Based on the previous context and API request history, generate an API request or a response as an AI assistant.""".strip()
format_instruction = """
The output should be of the JSON format, which specifies a list of generated function calls. The example format is as follows, please make sure the parameter type is correct. If no function call is needed, please make
tool_calls an empty list "[]".
{"thought": "the thought process, or an empty string", "tool_calls": [{"name": "api_name1", "arguments": {"argument1": "value1", "argument2": "value2"}}]}
""".strip()
# Define the input query and available tools
query = "What's the weather like in New York in fahrenheit?"
get_weather_api = {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, New York"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The unit of temperature to return"
}
},
"required": ["location"]
}
}
search_api = {
"name": "search",
"description": "Search for information on the internet",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query, e.g. 'latest news on AI'"
}
},
"required": ["query"]
}
}
openai_format_tools = [get_weather_api, search_api]
# Helper function to convert openai format tools to our more concise xLAM format
def convert_to_xlam_tool(tools):
''''''
if isinstance(tools, dict):
return {
"name": tools["name"],
"description": tools["description"],
"parameters": {k: v for k, v in tools["parameters"].get("properties", {}).items()}
}
elif isinstance(tools, list):
return [convert_to_xlam_tool(tool) for tool in tools]
else:
return tools
def build_conversation_history_prompt(conversation_history: str):
parsed_history = []
for step_data in conversation_history:
parsed_history.append({
"step_id": step_data["step_id"],
"thought": step_data["thought"],
"tool_calls": step_data["tool_calls"],
"next_observation": step_data["next_observation"],
"user_input": step_data['user_input']
})
history_string = json.dumps(parsed_history)
return f"
[BEGIN OF HISTORY STEPS]
{history_string}
[END OF HISTORY STEPS]
"
# Helper function to build the input prompt for our model
def build_prompt(task_instruction: str, format_instruction: str, tools: list, query: str):
prompt = f"[BEGIN OF TASK INSTRUCTION]\n{task_instruction}\n[END OF TASK INSTRUCTION]\n\n"
prompt += f"[BEGIN OF AVAILABLE TOOLS]\n{json.dumps(xlam_format_tools)}\n[END OF AVAILABLE TOOLS]\n\n"
prompt += f"[BEGIN OF FORMAT INSTRUCTION]\n{format_instruction}\n[END OF FORMAT INSTRUCTION]\n\n"
prompt += f"[BEGIN OF QUERY]\n{query}\n[END OF QUERY]\n\n"
return prompt
# Build the input and start the inference
xlam_format_tools = convert_to_xlam_tool(openai_format_tools)
conversation_history = []
content = build_prompt(task_instruction, format_instruction, xlam_format_tools, query, conversation_history)
messages=[
{ 'role': 'user', 'content': content}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
# tokenizer.eos_token_id is the id of <|EOT|> token
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
agent_action = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
然后,您应该能够看到以下JSON格式的输出字符串:
{"thought": "I need to get the current weather for New York in fahrenheit.", "tool_calls": [{"name": "get_weather", "arguments": {"location": "New York", "unit": "fahrenheit"}}]}
2. 多轮使用场景
我们的模型系列也支持多轮交互。以下是上述示例的下一轮交互示例:
def parse_agent_action(agent_action: str):
"""
Given an agent's action, parse it to add to conversation history
"""
try: parsed_agent_action_json = json.loads(agent_action)
except: return "", []
if "thought" not in parsed_agent_action_json.keys(): thought = ""
else: thought = parsed_agent_action_json["thought"]
if "tool_calls" not in parsed_agent_action_json.keys(): tool_calls = []
else: tool_calls = parsed_agent_action_json["tool_calls"]
return thought, tool_calls
def update_conversation_history(conversation_history: list, agent_action: str, environment_response: str, user_input: str):
"""
Update the conversation history list based on the new agent_action, environment_response, and/or user_input
"""
thought, tool_calls = parse_agent_action(agent_action)
new_step_data = {
"step_id": len(conversation_history) + 1,
"thought": thought,
"tool_calls": tool_calls,
"step_id": len(conversation_history),
"next_observation": environment_response,
"user_input": user_input,
}
conversation_history.append(new_step_data)
def get_environment_response(agent_action: str):
"""
Get the environment response for the agent_action
"""
# TODO: add custom implementation here
error_message, response_message = "", ""
return {"error": error_message, "response": response_message}
# ------------- before here are the steps to get agent_response from the example above ----------
# 1. get the next state after agent's response:
# The next 2 lines are examples of getting environment response and user_input.
# It is depended on particular usage, we can have either one or both of those.
environment_response = get_environment_response(agent_action)
user_input = "Now, search on the Internet for cute puppies"
# 2. after we got environment_response and (or) user_input, we want to add to our conversation history
update_conversation_history(conversation_history, agent_action, environment_response, user_input)
# 3. we now can build the prompt
content = build_prompt(task_instruction, format_instruction, xlam_format_tools, query, conversation_history)
# 4. Now, we just retrieve the inputs for the LLM
messages=[
{ 'role': 'user', 'content': content}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
# 5. Generate the outputs & decode
# tokenizer.eos_token_id is the id of <|EOT|> token
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
agent_action = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
相应的输出如下:
{"thought": "I need to get the current weather for New York in fahrenheit.", "tool_calls": [{"name": "get_weather", "arguments": {"location": "New York", "unit": "fahrenheit"}}]}
我们强烈建议使用我们提供的提示格式和辅助函数,以获得我们模型最佳的函数调用性能。
多轮提示和输出示例
提示:
[BEGIN OF TASK INSTRUCTION]
Based on the previous context and API request history, generate an API request or a response as an AI assistant.
[END OF TASK INSTRUCTION]
[BEGIN OF AVAILABLE TOOLS]
[
{
"name": "get_fire_info",
"description": "Query the latest wildfire information",
"parameters": {
"location": {
"type": "string",
"description": "Location of the wildfire, for example: 'California'",
"required": true,
"format": "free"
},
"radius": {
"type": "number",
"description": "The radius (in miles) around the location where the wildfire is occurring, for example: 10",
"required": false,
"format": "free"
}
}
},
{
"name": "get_hurricane_info",
"description": "Query the latest hurricane information",
"parameters": {
"name": {
"type": "string",
"description": "Name of the hurricane, for example: 'Irma'",
"required": true,
"format": "free"
}
}
},
{
"name": "get_earthquake_info",
"description": "Query the latest earthquake information",
"parameters": {
"magnitude": {
"type": "number",
"description": "The minimum magnitude of the earthquake that needs to be queried.",
"required": false,
"format": "free"
},
"location": {
"type": "string",
"description": "Location of the earthquake, for example: 'California'",
"required": false,
"format": "free"
}
}
}
]
[END OF AVAILABLE TOOLS]
[BEGIN OF FORMAT INSTRUCTION]
Your output should be in the JSON format, which specifies a list of function calls. The example format is as follows. Please make sure the parameter type is correct. If no function call is needed, please make tool_calls an empty list '[]'.
```{"thought": "the thought process, or an empty string", "tool_calls": [{"name": "api_name1", "arguments": {"argument1": "value1", "argument2": "value2"}}]}```
[END OF FORMAT INSTRUCTION]
[BEGIN OF QUERY]
User: Can you give me the latest information on the wildfires occurring in California?
[END OF QUERY]
[BEGIN OF HISTORY STEPS]
[
{
"thought": "Sure, what is the radius (in miles) around the location of the wildfire?",
"tool_calls": [],
"step_id": 1,
"next_observation": "",
"user_input": "User: Let me think... 50 miles."
},
{
"thought": "",
"tool_calls": [
{
"name": "get_fire_info",
"arguments": {
"location": "California",
"radius": 50
}
}
],
"step_id": 2,
"next_observation": [
{
"location": "Los Angeles",
"acres_burned": 1500,
"status": "contained"
},
{
"location": "San Diego",
"acres_burned": 12000,
"status": "active"
}
]
},
{
"thought": "Based on the latest information, there are wildfires in Los Angeles and San Diego. The wildfire in Los Angeles has burned 1,500 acres and is contained, while the wildfire in San Diego has burned 12,000 acres and is still active.",
"tool_calls": [],
"step_id": 3,
"next_observation": "",
"user_input": "User: Can you tell me about the latest earthquake?"
}
]
[END OF HISTORY STEPS]
输出:
{"thought": "", "tool_calls": [{"name": "get_earthquake_info", "arguments": {"location": "California"}}]}
📊 基准测试结果
注意:加粗和下划线结果分别表示成功率的最佳结果和第二佳结果。
伯克利函数调用排行榜(BFCL)
表1:BFCL-v2排行榜上的性能比较(截止日期2024年9月3日)。排名基于整体准确率,即不同评估类别的加权平均值。“FC”表示函数调用模式,与使用自定义“提示”提取函数调用形成对比。
Webshop和ToolQuery
表2:Webshop和ToolQuery上的测试结果。加粗和下划线结果分别表示成功率的最佳结果和第二佳结果。
统一ToolQuery
表3:ToolQuery-Unified上的测试结果。加粗和下划线结果分别表示成功率的最佳结果和第二佳结果。括号中的值表示在ToolQuery上的相应性能。
ToolBench
表4:ToolBench在三种不同场景下的通过率。加粗和下划线结果分别表示每个设置的最佳结果和第二佳结果。由于ToolBench服务器在2024年7月28日至我们的评估截止日期2024年9月3日期间停机,xLAM-8x22b-r的结果不可用。
📄 许可证
该模型遵循CC-BY-NC-4.0许可证分发。
⚠️ 伦理考量
本次发布仅用于支持学术论文的研究目的。我们的模型、数据集和代码并非专门为所有下游用途设计或评估。我们强烈建议用户在部署此模型之前,评估并解决与准确性、安全性和公平性相关的潜在问题。我们鼓励用户考虑AI的常见局限性,遵守适用法律,并在选择用例时采用最佳实践,特别是在错误或滥用可能对人们的生活、权利或安全产生重大影响的高风险场景中。有关用例的更多指导,请参考我们的AUP和AI AUP。
📝 引用信息
如果您觉得这个仓库有帮助,请考虑引用我们的论文:
@article{zhang2024xlam,
title={xLAM: A Family of Large Action Models to Empower AI Agent Systems},
author={Zhang, Jianguo and Lan, Tian and Zhu, Ming and Liu, Zuxin and Hoang, Thai and Kokane, Shirley and Yao, Weiran and Tan, Juntao and Prabhakar, Akshara and Chen, Haolin and others},
journal={arXiv preprint arXiv:2409.03215},
year={2024}
}
@article{liu2024apigen,
title={Apigen



