xLAM-2-3b-fc-r開源大型動作模型 - 支持多輪對話與工具使用，函數調用超出色

首頁

Xlam 2 3b Fc R

由Salesforce開發

xLAM-2系列是基於先進數據合成和訓練流程構建的大型動作模型(LAMs)，專注於多輪對話和工具使用，在函數調用和代理任務中表現卓越。

大型語言模型

Transformers

英語#多輪函數調用 #AI代理大腦 #128k長上下文

下載量 353

發布時間 : 3/27/2025

模型概述

xLAM-2是新一代大型動作模型，通過創新的APIGen-MT框架訓練，在BFCL和τ-bench基準測試中達到最先進水平。模型優化了聊天模板和vLLM集成，便於構建高級AI代理。

模型特點

多輪對話能力

在複雜多輪對話場景中表現出色，能保持上下文一致性

高級函數調用

專為工具使用和函數調用優化，可準確解析和執行API調用

長上下文處理

支持128k超長上下文窗口，適合處理複雜任務

vLLM兼容性

完全兼容vLLM推理框架，便於高吞吐量部署

模型能力

自然語言理解

函數調用

多輪對話

任務規劃

工作流自動化

使用案例

智能助手

天氣查詢助手

通過調用天氣API提供即時天氣信息

準確解析用戶位置並返回格式化天氣數據

旅行規劃

多輪交互規劃旅行路線和預訂服務

能協調多個API完成複雜旅行安排

企業自動化

CRM集成

與Salesforce CRM系統集成處理客戶請求

自動化常見客戶服務流程

🚀 xLAM-2模型家族

大動作模型（LAMs）是先進的語言模型，旨在將用戶意圖轉化為可執行的動作，以增強決策能力。作為AI智能體的“大腦”，LAMs能自主規劃和執行任務以實現特定目標，在不同領域的工作流自動化中具有重要價值。本模型版本僅用於研究目的。

新的xLAM-2系列基於先進的數據合成、處理和訓練管道構建，在多輪對話和工具使用方面取得了顯著進展。該系列模型採用了新穎的APIGen - MT框架進行訓練，通過模擬智能體與人類的交互生成高質量的訓練數據。在BFCL和τ - bench基準測試中，我們的模型取得了領先的性能，超越了GPT - 4o和Claude 3.5等前沿模型。值得注意的是，即使是較小的模型在多輪場景中也展現出了卓越的能力，並且在多次試驗中保持了出色的一致性。

我們還優化了聊天模板和vLLM集成，使構建先進的AI智能體變得更加容易。與之前的xLAM模型相比，xLAM - 2提供了更優越的性能，並能在各種應用中實現無縫部署。

Model Performance Overview
較大的xLAM - 2 - fc - r模型（8B - 70B，使用APIGen - MT數據訓練）在函數調用（BFCL v3，截至2025年4月2日）和智能體能力（τ - bench）方面與最先進的基線模型的性能比較。

🚀 快速開始

框架版本

Transformers 4.46.1（或更高版本）
PyTorch 2.5.1+cu124（或更高版本）
Datasets 3.1.0（或更高版本）
Tokenizers 0.20.3（或更高版本）

基本使用方法

使用Huggingface聊天模板

新的xLAM模型與Hugging Face Transformers庫無縫協作，並使用自然的聊天模板，提供簡單直觀的對話體驗。以下是使用這些模型的示例代碼：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r")
model = AutoModelForCausalLM.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r", torch_dtype=torch.bfloat16, device_map="auto")

# Example conversation with a tool call
messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "Thanks. I am doing well. How can I help you?"},
    {"role": "user", "content": "What's the weather like in London?"},
]

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
            },
            "required": ["location"]
        }
    }
]

print("====== prompt after applying chat template ======")
print(tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, tokenize=False))

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
input_ids_len = inputs["input_ids"].shape[-1] # Get the length of the input tokens
inputs = {k: v.to(model.device) for k, v in inputs.items()}
print("====== model response ======")
outputs = model.generate(**inputs, max_new_tokens=256)
generated_tokens = outputs[:, input_ids_len:] # Slice the output to get only the newly generated tokens
print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))

使用vLLM進行推理

xLAM模型也可以使用vLLM高效服務，以實現高吞吐量的推理。請使用vllm>=0.6.5，因為早期版本會導致基於Qwen的模型性能下降。

安裝和服務

安裝所需版本的vLLM：

pip install "vllm>=0.6.5"

將工具解析器插件下載到本地路徑：

wget https://huggingface.co/Salesforce/xLAM-2-1b-fc-r/raw/main/xlam_tool_call_parser.py

啟動與OpenAI API兼容的端點：

vllm serve Salesforce/xLAM-2-1b-fc-r \
  --enable-auto-tool-choice \
  --tool-parser-plugin ./xlam_tool_call_parser.py \
  --tool-call-parser xlam \
  --tensor-parallel-size 1

注意：確保已下載工具解析器插件文件，並且--tool-parser-plugin中指定的路徑正確指向本地文件副本。xLAM系列模型都使用相同的工具調用解析器，因此所有模型只需下載一次。

使用OpenAI API進行測試

以下是一個使用服務端點測試工具使用的最小示例：

import openai
import json

# Configure the client to use your local vLLM endpoint
client = openai.OpenAI(
    base_url="http://localhost:8000/v1",  # Default vLLM server URL
    api_key="empty"  # Can be any string
)

# Define a tool/function
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The unit of temperature to return"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Create a chat completion
response = client.chat.completions.create(
    model="Salesforce/xLAM-2-1b-fc-r",  # Model name doesn't matter, vLLM uses the served model
    messages=[
        {"role": "system", "content": "You are a helpful assistant that can use tools."},
        {"role": "user", "content": "What's the weather like in San Francisco?"}
    ],
    tools=tools,
    tool_choice="auto"
)

# Print the response
print("Assistant's response:")
print(json.dumps(response.model_dump(), indent=2))

有關更高級的配置和部署選項，請參閱vLLM文檔。

✨ 主要特性

多輪對話和工具使用能力提升：基於先進的數據合成、處理和訓練管道，在多輪對話和工具使用方面取得顯著進展。
高性能表現：在BFCL和τ - bench基準測試中超越了GPT - 4o和Claude 3.5等前沿模型。
易於集成：優化了聊天模板和vLLM集成，方便構建先進的AI智能體。

📦 安裝指南

框架版本要求

Transformers 4.46.1（或更高版本）
PyTorch 2.5.1+cu124（或更高版本）
Datasets 3.1.0（或更高版本）
Tokenizers 0.20.3（或更高版本）

vLLM安裝

pip install "vllm>=0.6.5"

工具解析器插件下載

wget https://huggingface.co/Salesforce/xLAM-2-1b-fc-r/raw/main/xlam_tool_call_parser.py

💻 使用示例

基礎用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r")
model = AutoModelForCausalLM.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r", torch_dtype=torch.bfloat16, device_map="auto")

# Example conversation with a tool call
messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "Thanks. I am doing well. How can I help you?"},
    {"role": "user", "content": "What's the weather like in London?"},
]

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
            },
            "required": ["location"]
        }
    }
]

print("====== prompt after applying chat template ======")
print(tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, tokenize=False))

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
input_ids_len = inputs["input_ids"].shape[-1] # Get the length of the input tokens
inputs = {k: v.to(model.device) for k, v in inputs.items()}
print("====== model response ======")
outputs = model.generate(**inputs, max_new_tokens=256)
generated_tokens = outputs[:, input_ids_len:] # Slice the output to get only the newly generated tokens
print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))

高級用法

使用vLLM進行推理

import openai
import json

# Configure the client to use your local vLLM endpoint
client = openai.OpenAI(
    base_url="http://localhost:8000/v1",  # Default vLLM server URL
    api_key="empty"  # Can be any string
)

# Define a tool/function
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The unit of temperature to return"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Create a chat completion
response = client.chat.completions.create(
    model="Salesforce/xLAM-2-1b-fc-r",  # Model name doesn't matter, vLLM uses the served model
    messages=[
        {"role": "system", "content": "You are a helpful assistant that can use tools."},
        {"role": "user", "content": "What's the weather like in San Francisco?"}
    ],
    tools=tools,
    tool_choice="auto"
)

# Print the response
print("Assistant's response:")
print(json.dumps(response.model_dump(), indent=2))

📚 詳細文檔

模型系列

屬性	詳情
模型類型	xLAM系列在許多方面表現出色，包括通用任務和函數調用。對於相同數量的參數，該模型在廣泛的智能體任務和場景中進行了微調，同時保留了原始模型的能力。
訓練數據	- Salesforce/APIGen - MT - 5k - Salesforce/xlam - function - calling - 60k

模型名稱	總參數數量	上下文長度	類別	下載模型鏈接	下載GGUF文件鏈接
Llama - xLAM - 2 - 70b - fc - r	70B	128k	多輪對話、函數調用	🤗 鏈接	NA
Llama - xLAM - 2 - 8b - fc - r	8B	128k	多輪對話、函數調用	🤗 鏈接	🤗 鏈接
xLAM - 2 - 32b - fc - r	32B	32k（最大128k）*	多輪對話、函數調用	🤗 鏈接	NA
xLAM - 2 - 3b - fc - r	3B	32k（最大128k）*	多輪對話、函數調用	🤗 鏈接	🤗 鏈接
xLAM - 2 - 1b - fc - r	1B	32k（最大128k）*	多輪對話、函數調用	🤗 鏈接	🤗 鏈接

*注意：基於Qwen - 2.5的模型的默認上下文長度為32k，但您可以使用YaRN（Yet Another Recursive Network）等技術實現最大128k的上下文長度。更多詳細信息請參考此處。

您還可以在此處探索我們之前的xLAM系列。

-fc後綴表示這些模型針對函數調用任務進行了微調，而-r後綴表示這是一個研究版本。

✅ 所有模型都與vLLM和基於Transformers的推理框架完全兼容。

基準測試結果

伯克利函數調用排行榜（BFCL v3）

BFCL Results
不同模型在[BFCL排行榜](https://gorilla.cs.berkeley.edu/leaderboard.html)上的性能比較。排名基於整體準確率，這是不同評估類別的加權平均值。“FC”表示函數調用模式，與使用自定義“提示”提取函數調用相對。

τ - bench基準測試

Tau - bench Results
τ - bench基準測試的成功率（pass@1），至少進行5次試驗並取平均值。我們的xLAM - 2 - 70b - fc - r模型在τ - bench上的總體成功率達到56.2%，顯著優於基礎的Llama 3.1 70B Instruct模型（38.2%）和其他開源模型，如DeepSeek v3（40.6%）。值得注意的是，我們的最佳模型甚至超過了專有模型，如GPT - 4o（52.9%），並接近Claude 3.5 Sonnet（new）（60.1%）等較新模型的性能。

Pass^k curves
Pass^k曲線衡量了給定任務在所有5次獨立試驗中都成功的概率，分別對τ - retail（左）和τ - airline（右）領域的所有任務進行平均。值越高表示模型的一致性越好。

倫理考慮

本版本僅用於支持學術論文的研究目的。我們的模型、數據集和代碼並非專門為所有下游用途設計或評估。我們強烈建議用戶在部署此模型之前評估並解決與準確性、安全性和公平性相關的潛在問題。我們鼓勵用戶考慮AI的常見侷限性，遵守適用法律，並在選擇用例時採用最佳實踐，特別是在高風險場景中，錯誤或濫用可能會對人們的生活、權利或安全產生重大影響。有關用例的進一步指導，請參考我們的AUP和AI AUP。

模型許可證

🔧 技術細節

本模型採用新穎的APIGen - MT框架進行訓練，該框架通過模擬智能體與人類的交互生成高質量的訓練數據。在多輪對話和工具使用方面，模型基於先進的數據合成、處理和訓練管道構建，從而在BFCL和τ - bench基準測試中取得了領先的性能。

📄 許可證

本模型使用CC - BY - NC - 4.0許可證。

引用

如果您在工作中使用了我們的模型或數據集，請引用我們的論文：

@article{prabhakar2025apigen,
  title={APIGen-MT: Agentic PIpeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay},
  author={Prabhakar, Akshara and Liu, Zuxin and Zhu, Ming and Zhang, Jianguo and Awalgaonkar, Tulika and Wang, Shiyu and Liu, Zhiwei and Chen, Haolin and Hoang, Thai and others},
  journal={arXiv preprint arXiv:2504.03601},
  year={2025}
}

此外，請查看我們關於xLAM系列的其他優秀相關工作，並考慮也引用它們：

@article{zhang2025actionstudio,
  title={ActionStudio: A Lightweight Framework for Data and Training of Action Models},
  author={Zhang, Jianguo and Hoang, Thai and Zhu, Ming and Liu, Zuxin and Wang, Shiyu and Awalgaonkar, Tulika and Prabhakar, Akshara and Chen, Haolin and Yao, Weiran and Liu, Zhiwei and others},
  journal={arXiv preprint arXiv:2503.22673},
  year={2025}
}

@article{zhang2024xlam,
  title={xLAM: A Family of Large Action Models to Empower AI Agent Systems},
  author={Zhang, Jianguo and Lan, Tian and Zhu, Ming and Liu, Zuxin and Hoang, Thai and Kokane, Shirley and Yao, Weiran and Tan, Juntao and Prabhakar, Akshara and Chen, Haolin and others},
  journal={arXiv preprint arXiv:2409.03215},
  year={2024}
}

@article{liu2024apigen,
  title={Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets},
  author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and RN, Rithesh and others},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={54463--54482},
  year={2024}
}

@article{zhang2024agentohana,
  title={AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning},
  author={Zhang, Jianguo and Lan, Tian and Murthy, Rithesh and Liu, Zhiwei and Yao, Weiran and Tan, Juntao and Hoang, Thai and Yang, Liangwei and Feng, Yihao and Liu, Zuxin and others},
  journal={arXiv preprint arXiv:2402.15506},
  year={2024}
}