Llama-xLAM-2-8b-fc-r-gguf开源模型 - 支持多轮对话，将意图转为可执行动作

首页

Llama Xlam 2 8b Fc R Gguf

由 Salesforce 开发

xLAM-2是基于先进数据合成和训练管道构建的大型动作模型，擅长多轮对话和工具使用，能将用户意图转化为可执行动作。

大型语言模型

Transformers

英语#多轮对话优化 #函数调用专家 #工作流自动化

下载量 1,809

发布时间 : 3/28/2025

模型简介

xLAM-2模型系列通过APIGen-MT框架训练，在多轮对话和工具使用方面表现卓越，可作为AI智能体的'大脑'自主执行任务。

模型特点

先进性能

在BFCL和Ï„-bench基准测试中超越GPT-4o和Claude 3.5等前沿模型

多轮对话优化

专门优化的架构在连续对话场景中保持出色的一致性

工具使用能力

通过APIGen-MT框架训练，能有效将自然语言指令转化为API调用

易集成性

优化了聊天模板和vLLM集成，便于构建AI智能体系统

模型能力

自然语言理解

函数调用

多轮对话处理

工作流自动化

工具使用

意图识别

使用案例

智能助手

自动化客服

处理复杂的多轮客户咨询并调用相关系统API解决问题

在Ï„-bench零售领域达到56.2%成功率

业务流程自动化

航空订票系统

理解用户旅行需求并自动完成航班查询、预订等操作

在Ï„-bench航空领域表现优异

🚀 xLAM-2模型家族

xLAM-2模型家族基于先进的数据合成、处理和训练管道构建，在多轮对话和工具使用方面表现卓越。该模型系列能将用户意图转化为可执行动作，为自动化工作流程提供强大支持。本模型发布仅用于研究目的。

项目链接

xLAM

🚀 快速开始

本仓库提供了Llama-xLAM-2-8b-fc-r模型的GGUF格式。你可以通过以下链接访问原始模型Llama-xLAM-2-8b-fc-r。

大型动作模型（LAMs）是先进的语言模型，旨在通过将用户意图转化为可执行动作来增强决策能力。作为AI智能体的“大脑”，LAMs能够自主规划和执行任务以实现特定目标，在不同领域的工作流自动化中具有重要价值。

新的xLAM-2系列基于我们最先进的数据合成、处理和训练管道构建，在多轮对话和工具使用方面取得了显著飞跃。该系列模型使用我们新颖的APIGen-MT框架进行训练，该框架通过模拟智能体与人类的交互生成高质量的训练数据。我们的模型在BFCL和Ï„-bench基准测试中达到了最先进的性能，超越了GPT-4o和Claude 3.5等前沿模型。值得注意的是，即使是我们的较小模型在多轮场景中也表现出卓越的能力，并且在多次试验中保持了出色的一致性。

我们还优化了聊天模板和vLLM集成，使构建先进的AI智能体更加容易。与之前的xLAM模型相比，xLAM-2在各种应用中提供了更出色的性能和无缝的部署体验。

Model Performance Overview
较大的xLAM-2-fc-r模型（8B - 70B，使用APIGen-MT数据训练）在函数调用（BFCL v3，截至2025年4月2日）和智能体能力（Ï„-bench）方面与最先进的基线模型的性能比较。

✨ 主要特性

先进的性能：在BFCL和Ï„-bench基准测试中超越前沿模型，如GPT-4o和Claude 3.5。
多轮对话能力：即使是较小的模型也能在多轮场景中表现出色。
工具使用优化：通过APIGen-MT框架训练，能更好地利用工具完成任务。
易于集成：优化了聊天模板和vLLM集成，方便构建AI智能体。

📦 安装指南

下载GGUF文件

安装Hugging Face CLI：

pip install huggingface-hub

登录Hugging Face：

huggingface-cli login

下载GGUF模型：

huggingface-cli download Salesforce/Llama-xLAM-2-8b-fc-r-gguf Llama-xLAM-2-8b-fc-r-gguf --local-dir . --local-dir-use-symlinks False

💻 使用示例

基础用法

命令行

从此处的源代码安装llama.cpp框架。
按如下方式运行推理任务。有关生成相关参数的配置，请参考llama.cpp文档。

llama-cli -m [PATH-TO-LOCAL-GGUF]

Python框架

安装llama-cpp-python：

pip install llama-cpp-python

使用高级API进行推理：

from llama_cpp import Llama
llm = Llama(
      model_path="[PATH-TO-MODEL]"
)
output = llm.create_chat_completion(
      messages = [
        {
          "role": "system",
          "content": "You are a helpful assistant that can use tools. You are developed by Salesforce xLAM team."

        },
        {
          "role": "user",
          "content": "Extract Jason is 25 years old"
        }
      ],
      tools=[{
        "type": "function",
        "function": {
          "name": "UserDetail",
          "parameters": {
            "type": "object",
            "title": "UserDetail",
            "properties": {
              "name": {
                "title": "Name",
                "type": "string"
              },
              "age": {
                "title": "Age",
                "type": "integer"
              }
            },
            "required": [ "name", "age" ]
          }
        }
      }],
      tool_choice={
        "type": "function",
        "function": {
          "name": "UserDetail"
        }
      }
)
print(output['choices'][0]['message'])

高级用法

GGUF模型使用以下提示模板：

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{TASK_INSTRUCTION}
You have access to a set of tools. When using tools, make calls in a single JSON array: 

[{"name": "tool_call_name", "arguments": {"arg1": "value1", "arg2": "value2"}}, ... (additional parallel tool calls as needed)]

If no tool is suitable, state that explicitly. If the user's input lacks required parameters, ask for clarification. Do not interpret or respond until tool results are returned. Once they are available, process them or make additional calls if needed. For tasks that don't require tools, such as casual conversation or general advice, respond directly in plain text. The available tools are:

{AVAILABLE_TOOLS}

<|eot_id|><|start_header_id|>user<|end_header_id|>

{USER_QUERY}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{ASSISTANT_QUERY}<|eot_id|><|start_header_id|>user<|end_header_id|>

{USER_QUERY}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

📚 详细文档

模型系列

xLAM系列在许多方面表现出色，包括通用任务和函数调用。对于相同数量的参数，该模型在广泛的智能体任务和场景中进行了微调，同时保留了原始模型的能力。

模型	总参数数量	上下文长度	类别	下载模型	下载GGUF文件
Llama-xLAM-2-70b-fc-r	70B	128k	多轮对话、函数调用	🤖 链接	NA
Llama-xLAM-2-8b-fc-r	8B	128k	多轮对话、函数调用	🤖 链接	🤖 链接
xLAM-2-32b-fc-r	32B	32k (最大128k)*	多轮对话、函数调用	🤖 链接	NA
xLAM-2-3b-fc-r	3B	32k (最大128k)*	多轮对话、函数调用	🤖 链接	🤖 链接
xLAM-2-1b-fc-r	1B	32k (最大128k)*	多轮对话、函数调用	🤖 链接	🤖 链接

注意：基于Qwen-2.5的模型的默认上下文长度为32k，但你可以使用YaRN（Yet Another Recursive Network）等技术实现最大128k的上下文长度。更多详细信息请参考此处。

你还可以在此处探索我们之前的xLAM系列。

-fc后缀表示模型针对函数调用任务进行了微调，而-r后缀表示这是一个研究版本。

✅ 所有模型都与vLLM和基于Transformers的推理框架完全兼容。

基准测试结果

Berkeley函数调用排行榜（BFCL v3）

BFCL Results
不同模型在[BFCL排行榜](https://gorilla.cs.berkeley.edu/leaderboard.html)上的性能比较。排名基于整体准确率，这是不同评估类别的加权平均值。“FC”表示函数调用模式，与使用自定义“提示”提取函数调用相对。

Ï„-bench基准测试

Tau-bench Results
在Ï„-bench基准测试中，至少5次试验的平均成功率（pass@1）。我们的xLAM-2-70b-fc-r模型在Ï„-bench上的整体成功率达到了56.2%，显著超过了基础Llama 3.1 70B Instruct模型（38.2%）和DeepSeek v3（40.6%）等其他开源模型。值得注意的是，我们的最佳模型甚至超过了GPT-4o（52.9%）等专有模型，并接近Claude 3.5 Sonnet（新）（60.1%）等较新模型的性能。

Pass^k curves
Pass^k曲线衡量了给定任务在所有5次独立试验中都成功的概率，是在Ï„-零售（左）和Ï„-航空（右）领域的所有任务上的平均值。值越高表示模型的一致性越好。

伦理考虑

本版本仅用于支持学术论文的研究目的。我们的模型、数据集和代码并非专门为所有下游用途设计或评估。我们强烈建议用户在部署此模型之前评估并解决与准确性、安全性和公平性相关的潜在问题。我们鼓励用户考虑AI的常见局限性，遵守适用法律，并在选择用例时采用最佳实践，特别是在错误或滥用可能对人们的生活、权利或安全产生重大影响的高风险场景中。有关用例的更多指导，请参考我们的AUP和AI AUP。

模型许可证

🔧 技术细节

本项目使用了先进的APIGen-MT框架进行训练，该框架通过模拟智能体与人类的交互生成高质量的训练数据。模型在多轮对话和工具使用方面进行了优化，能够更好地将用户意图转化为可执行动作。

📄 许可证

本项目采用CC BY-NC 4.0许可证。

🔗 引用

如果你在工作中使用了我们的模型或数据集，请引用我们的论文：

@article{prabhakar2025apigenmt,
  title={APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay},
  author={Prabhakar, Akshara and Liu, Zuxin and Yao, Weiran and Zhang, Jianguo and Zhu, Ming and Wang, Shiyu and Liu, Zhiwei and Awalgaonkar, Tulika and Chen, Haolin and Hoang, Thai and Niebles, Juan Carlos and Heinecke, Shelby and Wang, Huan and Savarese, Silvio and Xiong, Caiming},
  journal={arXiv preprint arXiv:2504.03601},
  year={2025}
}

此外，请查看我们关于xLAM系列的其他优秀作品，并考虑同时引用它们：

@article{zhang2025actionstudio,
  title={ActionStudio: A Lightweight Framework for Data and Training of Action Models},
  author={Zhang, Jianguo and Hoang, Thai and Zhu, Ming and Liu, Zuxin and Wang, Shiyu and Awalgaonkar, Tulika and Prabhakar, Akshara and Chen, Haolin and Yao, Weiran and Liu, Zhiwei and others},
  journal={arXiv preprint arXiv:2503.22673},
  year={2025}
}

@article{zhang2024xlam,
  title={xLAM: A Family of Large Action Models to Empower AI Agent Systems},
  author={Zhang, Jianguo and Lan, Tian and Zhu, Ming and Liu, Zuxin and Hoang, Thai and Kokane, Shirley and Yao, Weiran and Tan, Juntao and Prabhakar, Akshara and Chen, Haolin and others},
  journal={arXiv preprint arXiv:2409.03215},
  year={2024}
}

@article{liu2024apigen,
  title={Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets},
  author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and RN, Rithesh and others},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={54463--54482},
  year={2024}
}

@article{zhang2024agentohana,
  title={AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning},
  author={Zhang, Jianguo and Lan, Tian and Murthy, Rithesh and Liu, Zhiwei and Yao, Weiran and Tan, Juntao and Hoang, Thai and Yang, Liangwei and Feng, Yihao and Liu, Zuxin and others},
  journal={arXiv preprint arXiv:2402.15506},
  year={2024}
}