xLAM-2-3b-fc-r开源大型动作模型 - 支持多轮对话与工具使用，函数调用超出色

Home

Xlam 2 3b Fc R

Developed by Salesforce

xLAM-2系列是基于先进数据合成和训练流程构建的大型动作模型(LAMs)，专注于多轮对话和工具使用，在函数调用和代理任务中表现卓越。

大型语言模型

Transformers

English#多轮函数调用 #AI代理大脑 #128k长上下文

Downloads 353

Release Time : 3/27/2025

Model Overview

xLAM-2是新一代大型动作模型，通过创新的APIGen-MT框架训练，在BFCL和τ-bench基准测试中达到最先进水平。模型优化了聊天模板和vLLM集成，便于构建高级AI代理。

Model Features

多轮对话能力

在复杂多轮对话场景中表现出色，能保持上下文一致性

高级函数调用

专为工具使用和函数调用优化，可准确解析和执行API调用

长上下文处理

支持128k超长上下文窗口，适合处理复杂任务

vLLM兼容性

完全兼容vLLM推理框架，便于高吞吐量部署

Model Capabilities

自然语言理解

函数调用

多轮对话

任务规划

工作流自动化

Use Cases

智能助手

天气查询助手

通过调用天气API提供实时天气信息

准确解析用户位置并返回格式化天气数据

旅行规划

多轮交互规划旅行路线和预订服务

能协调多个API完成复杂旅行安排

企业自动化

CRM集成

与Salesforce CRM系统集成处理客户请求

自动化常见客户服务流程

🚀 xLAM-2模型家族

大动作模型（LAMs）是先进的语言模型，旨在将用户意图转化为可执行的动作，以增强决策能力。作为AI智能体的“大脑”，LAMs能自主规划和执行任务以实现特定目标，在不同领域的工作流自动化中具有重要价值。本模型版本仅用于研究目的。

新的xLAM-2系列基于先进的数据合成、处理和训练管道构建，在多轮对话和工具使用方面取得了显著进展。该系列模型采用了新颖的APIGen - MT框架进行训练，通过模拟智能体与人类的交互生成高质量的训练数据。在BFCL和τ - bench基准测试中，我们的模型取得了领先的性能，超越了GPT - 4o和Claude 3.5等前沿模型。值得注意的是，即使是较小的模型在多轮场景中也展现出了卓越的能力，并且在多次试验中保持了出色的一致性。

我们还优化了聊天模板和vLLM集成，使构建先进的AI智能体变得更加容易。与之前的xLAM模型相比，xLAM - 2提供了更优越的性能，并能在各种应用中实现无缝部署。

Model Performance Overview
较大的xLAM - 2 - fc - r模型（8B - 70B，使用APIGen - MT数据训练）在函数调用（BFCL v3，截至2025年4月2日）和智能体能力（τ - bench）方面与最先进的基线模型的性能比较。

🚀 快速开始

框架版本

Transformers 4.46.1（或更高版本）
PyTorch 2.5.1+cu124（或更高版本）
Datasets 3.1.0（或更高版本）
Tokenizers 0.20.3（或更高版本）

基本使用方法

使用Huggingface聊天模板

新的xLAM模型与Hugging Face Transformers库无缝协作，并使用自然的聊天模板，提供简单直观的对话体验。以下是使用这些模型的示例代码：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r")
model = AutoModelForCausalLM.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r", torch_dtype=torch.bfloat16, device_map="auto")

# Example conversation with a tool call
messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "Thanks. I am doing well. How can I help you?"},
    {"role": "user", "content": "What's the weather like in London?"},
]

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
            },
            "required": ["location"]
        }
    }
]

print("====== prompt after applying chat template ======")
print(tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, tokenize=False))

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
input_ids_len = inputs["input_ids"].shape[-1] # Get the length of the input tokens
inputs = {k: v.to(model.device) for k, v in inputs.items()}
print("====== model response ======")
outputs = model.generate(**inputs, max_new_tokens=256)
generated_tokens = outputs[:, input_ids_len:] # Slice the output to get only the newly generated tokens
print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))

使用vLLM进行推理

xLAM模型也可以使用vLLM高效服务，以实现高吞吐量的推理。请使用vllm>=0.6.5，因为早期版本会导致基于Qwen的模型性能下降。

安装和服务

安装所需版本的vLLM：

pip install "vllm>=0.6.5"

将工具解析器插件下载到本地路径：

wget https://huggingface.co/Salesforce/xLAM-2-1b-fc-r/raw/main/xlam_tool_call_parser.py

启动与OpenAI API兼容的端点：

vllm serve Salesforce/xLAM-2-1b-fc-r \
  --enable-auto-tool-choice \
  --tool-parser-plugin ./xlam_tool_call_parser.py \
  --tool-call-parser xlam \
  --tensor-parallel-size 1

注意：确保已下载工具解析器插件文件，并且--tool-parser-plugin中指定的路径正确指向本地文件副本。xLAM系列模型都使用相同的工具调用解析器，因此所有模型只需下载一次。

使用OpenAI API进行测试

以下是一个使用服务端点测试工具使用的最小示例：

import openai
import json

# Configure the client to use your local vLLM endpoint
client = openai.OpenAI(
    base_url="http://localhost:8000/v1",  # Default vLLM server URL
    api_key="empty"  # Can be any string
)

# Define a tool/function
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The unit of temperature to return"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Create a chat completion
response = client.chat.completions.create(
    model="Salesforce/xLAM-2-1b-fc-r",  # Model name doesn't matter, vLLM uses the served model
    messages=[
        {"role": "system", "content": "You are a helpful assistant that can use tools."},
        {"role": "user", "content": "What's the weather like in San Francisco?"}
    ],
    tools=tools,
    tool_choice="auto"
)

# Print the response
print("Assistant's response:")
print(json.dumps(response.model_dump(), indent=2))

有关更高级的配置和部署选项，请参阅vLLM文档。

✨ 主要特性

多轮对话和工具使用能力提升：基于先进的数据合成、处理和训练管道，在多轮对话和工具使用方面取得显著进展。
高性能表现：在BFCL和τ - bench基准测试中超越了GPT - 4o和Claude 3.5等前沿模型。
易于集成：优化了聊天模板和vLLM集成，方便构建先进的AI智能体。

📦 安装指南

框架版本要求

Transformers 4.46.1（或更高版本）
PyTorch 2.5.1+cu124（或更高版本）
Datasets 3.1.0（或更高版本）
Tokenizers 0.20.3（或更高版本）

vLLM安装

pip install "vllm>=0.6.5"

工具解析器插件下载

wget https://huggingface.co/Salesforce/xLAM-2-1b-fc-r/raw/main/xlam_tool_call_parser.py

💻 使用示例

基础用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r")
model = AutoModelForCausalLM.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r", torch_dtype=torch.bfloat16, device_map="auto")

# Example conversation with a tool call
messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "Thanks. I am doing well. How can I help you?"},
    {"role": "user", "content": "What's the weather like in London?"},
]

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
            },
            "required": ["location"]
        }
    }
]

print("====== prompt after applying chat template ======")
print(tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, tokenize=False))

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
input_ids_len = inputs["input_ids"].shape[-1] # Get the length of the input tokens
inputs = {k: v.to(model.device) for k, v in inputs.items()}
print("====== model response ======")
outputs = model.generate(**inputs, max_new_tokens=256)
generated_tokens = outputs[:, input_ids_len:] # Slice the output to get only the newly generated tokens
print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))

高级用法

使用vLLM进行推理

import openai
import json

# Configure the client to use your local vLLM endpoint
client = openai.OpenAI(
    base_url="http://localhost:8000/v1",  # Default vLLM server URL
    api_key="empty"  # Can be any string
)

# Define a tool/function
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The unit of temperature to return"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Create a chat completion
response = client.chat.completions.create(
    model="Salesforce/xLAM-2-1b-fc-r",  # Model name doesn't matter, vLLM uses the served model
    messages=[
        {"role": "system", "content": "You are a helpful assistant that can use tools."},
        {"role": "user", "content": "What's the weather like in San Francisco?"}
    ],
    tools=tools,
    tool_choice="auto"
)

# Print the response
print("Assistant's response:")
print(json.dumps(response.model_dump(), indent=2))

📚 详细文档

模型系列

属性	详情
模型类型	xLAM系列在许多方面表现出色，包括通用任务和函数调用。对于相同数量的参数，该模型在广泛的智能体任务和场景中进行了微调，同时保留了原始模型的能力。
训练数据	- Salesforce/APIGen - MT - 5k - Salesforce/xlam - function - calling - 60k

模型名称	总参数数量	上下文长度	类别	下载模型链接	下载GGUF文件链接
Llama - xLAM - 2 - 70b - fc - r	70B	128k	多轮对话、函数调用	🤗 链接	NA
Llama - xLAM - 2 - 8b - fc - r	8B	128k	多轮对话、函数调用	🤗 链接	🤗 链接
xLAM - 2 - 32b - fc - r	32B	32k（最大128k）*	多轮对话、函数调用	🤗 链接	NA
xLAM - 2 - 3b - fc - r	3B	32k（最大128k）*	多轮对话、函数调用	🤗 链接	🤗 链接
xLAM - 2 - 1b - fc - r	1B	32k（最大128k）*	多轮对话、函数调用	🤗 链接	🤗 链接

*注意：基于Qwen - 2.5的模型的默认上下文长度为32k，但您可以使用YaRN（Yet Another Recursive Network）等技术实现最大128k的上下文长度。更多详细信息请参考此处。

您还可以在此处探索我们之前的xLAM系列。

-fc后缀表示这些模型针对函数调用任务进行了微调，而-r后缀表示这是一个研究版本。

✅ 所有模型都与vLLM和基于Transformers的推理框架完全兼容。

基准测试结果

伯克利函数调用排行榜（BFCL v3）

BFCL Results
不同模型在[BFCL排行榜](https://gorilla.cs.berkeley.edu/leaderboard.html)上的性能比较。排名基于整体准确率，这是不同评估类别的加权平均值。“FC”表示函数调用模式，与使用自定义“提示”提取函数调用相对。

τ - bench基准测试

Tau - bench Results
τ - bench基准测试的成功率（pass@1），至少进行5次试验并取平均值。我们的xLAM - 2 - 70b - fc - r模型在τ - bench上的总体成功率达到56.2%，显著优于基础的Llama 3.1 70B Instruct模型（38.2%）和其他开源模型，如DeepSeek v3（40.6%）。值得注意的是，我们的最佳模型甚至超过了专有模型，如GPT - 4o（52.9%），并接近Claude 3.5 Sonnet（new）（60.1%）等较新模型的性能。

Pass^k curves
Pass^k曲线衡量了给定任务在所有5次独立试验中都成功的概率，分别对τ - retail（左）和τ - airline（右）领域的所有任务进行平均。值越高表示模型的一致性越好。

伦理考虑

本版本仅用于支持学术论文的研究目的。我们的模型、数据集和代码并非专门为所有下游用途设计或评估。我们强烈建议用户在部署此模型之前评估并解决与准确性、安全性和公平性相关的潜在问题。我们鼓励用户考虑AI的常见局限性，遵守适用法律，并在选择用例时采用最佳实践，特别是在高风险场景中，错误或滥用可能会对人们的生活、权利或安全产生重大影响。有关用例的进一步指导，请参考我们的AUP和AI AUP。

模型许可证

🔧 技术细节

本模型采用新颖的APIGen - MT框架进行训练，该框架通过模拟智能体与人类的交互生成高质量的训练数据。在多轮对话和工具使用方面，模型基于先进的数据合成、处理和训练管道构建，从而在BFCL和τ - bench基准测试中取得了领先的性能。

📄 许可证

本模型使用CC - BY - NC - 4.0许可证。

引用

如果您在工作中使用了我们的模型或数据集，请引用我们的论文：

@article{prabhakar2025apigen,
  title={APIGen-MT: Agentic PIpeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay},
  author={Prabhakar, Akshara and Liu, Zuxin and Zhu, Ming and Zhang, Jianguo and Awalgaonkar, Tulika and Wang, Shiyu and Liu, Zhiwei and Chen, Haolin and Hoang, Thai and others},
  journal={arXiv preprint arXiv:2504.03601},
  year={2025}
}

此外，请查看我们关于xLAM系列的其他优秀相关工作，并考虑也引用它们：

@article{zhang2025actionstudio,
  title={ActionStudio: A Lightweight Framework for Data and Training of Action Models},
  author={Zhang, Jianguo and Hoang, Thai and Zhu, Ming and Liu, Zuxin and Wang, Shiyu and Awalgaonkar, Tulika and Prabhakar, Akshara and Chen, Haolin and Yao, Weiran and Liu, Zhiwei and others},
  journal={arXiv preprint arXiv:2503.22673},
  year={2025}
}

@article{zhang2024xlam,
  title={xLAM: A Family of Large Action Models to Empower AI Agent Systems},
  author={Zhang, Jianguo and Lan, Tian and Zhu, Ming and Liu, Zuxin and Hoang, Thai and Kokane, Shirley and Yao, Weiran and Tan, Juntao and Prabhakar, Akshara and Chen, Haolin and others},
  journal={arXiv preprint arXiv:2409.03215},
  year={2024}
}

@article{liu2024apigen,
  title={Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets},
  author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and RN, Rithesh and others},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={54463--54482},
  year={2024}
}

@article{zhang2024agentohana,
  title={AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning},
  author={Zhang, Jianguo and Lan, Tian and Murthy, Rithesh and Liu, Zhiwei and Yao, Weiran and Tan, Juntao and Hoang, Thai and Yang, Liangwei and Feng, Yihao and Liu, Zuxin and others},
  journal={arXiv preprint arXiv:2402.15506},
  year={2024}
}