xLAM-2-3b-fc-r-GGUF開源文本生成模型 - 支持函數調用及工具使用場景

首頁

Xlam 2 3b Fc R GGUF

由Mungert開發

xLAM-2-3b-fc-r GGUF模型是基於特定量化方法生成的文本生成模型，專注於函數調用、LLM代理和工具使用場景。

大型語言模型

Transformers

英語#函數調用優化 #多輪對話增強 #工具交互集成

下載量 1,316

發布時間 : 6/21/2025

模型概述

該模型在多輪對話和工具使用方面表現出色，支持函數調用和即時網絡服務交互，採用高精度量化方法提升性能。

模型特點

多輪對話能力

能夠處理複雜的多輪對話場景，在實際應用中表現出色。

函數調用功能

支持函數調用，可與即時網絡服務進行交互。

高精度量化

採用新的量化方法，提高了模型在低比特深度下的精度。

廣泛兼容性

與vLLM和基於Transformers的推理框架完全兼容。

模型能力

文本生成

函數調用

多輪對話

工具使用

即時網絡服務交互

使用案例

對話系統

多輪對話

處理複雜的多輪對話場景，如客服聊天機器人。

在實際應用中表現出色。

工具集成

天氣查詢

通過函數調用獲取即時天氣信息。

支持與即時網絡服務交互。

🚀 xLAM-2-3b-fc-r GGUF模型

xLAM-2-3b-fc-r GGUF模型是基於特定量化方法生成的文本生成模型，可用於函數調用、LLM代理、工具使用等場景。該模型在多輪對話和工具使用方面表現出色，在多個基準測試中取得了優異的成績。

元數據信息

屬性	詳情
模型類型	文本生成
訓練數據	Salesforce/APIGen-MT-5k、Salesforce/xlam-function-calling-60k
語言	en
標籤	function-calling、LLM Agent、tool-use、llama、qwen、pytorch、LLaMA-factory
庫名稱	transformers
許可證	cc-by-nc-4.0

🚀 快速開始

框架版本要求

Transformers 4.46.1（或更高版本）
PyTorch 2.5.1+cu124（或更高版本）
Datasets 3.1.0（或更高版本）
Tokenizers 0.20.3（或更高版本）

基本使用示例

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r")
model = AutoModelForCausalLM.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r", torch_dtype=torch.bfloat16, device_map="auto")

# 示例對話與工具調用
messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "Thanks. I am doing well. How can I help you?"},
    {"role": "user", "content": "What's the weather like in London?"},
]

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
            },
            "required": ["location"]
        }
    }
]

print("====== 應用聊天模板後的提示 ======")
print(tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, tokenize=False))

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
input_ids_len = inputs["input_ids"].shape[-1] # 獲取輸入令牌的長度
inputs = {k: v.to(model.device) for k, v in inputs.items()}
print("====== 模型響應 ======")
outputs = model.generate(**inputs, max_new_tokens=256)
generated_tokens = outputs[:, input_ids_len:] # 切片輸出以僅獲取新生成的令牌
print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))

✨ 主要特性

多輪對話能力：能夠處理複雜的多輪對話場景，在實際應用中表現出色。
函數調用功能：支持函數調用，可與即時網絡服務進行交互。
高精度量化：採用新的量化方法，提高了模型在低比特深度下的精度。
廣泛兼容性：與vLLM和基於Transformers的推理框架完全兼容。

📦 安裝指南

使用vLLM進行推理

安裝指定版本的vLLM：

pip install "vllm>=0.6.5"

下載工具解析器插件到本地路徑：

wget https://huggingface.co/Salesforce/xLAM-2-1b-fc-r/raw/main/xlam_tool_call_parser.py

啟動與OpenAI API兼容的端點：

vllm serve Salesforce/xLAM-2-1b-fc-r \
  --enable-auto-tool-choice \
  --tool-parser-plugin ./xlam_tool_call_parser.py \
  --tool-call-parser xlam \
  --tensor-parallel-size 1

💻 使用示例

基礎用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r")
model = AutoModelForCausalLM.from_pretrained("Salesforce/Llama-xLAM-2-3b-fc-r", torch_dtype=torch.bfloat16, device_map="auto")

# 示例對話與工具調用
messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "Thanks. I am doing well. How can I help you?"},
    {"role": "user", "content": "What's the weather like in London?"},
]

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
            },
            "required": ["location"]
        }
    }
]

print("====== 應用聊天模板後的提示 ======")
print(tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, tokenize=False))

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
input_ids_len = inputs["input_ids"].shape[-1] # 獲取輸入令牌的長度
inputs = {k: v.to(model.device) for k, v in inputs.items()}
print("====== 模型響應 ======")
outputs = model.generate(**inputs, max_new_tokens=256)
generated_tokens = outputs[:, input_ids_len:] # 切片輸出以僅獲取新生成的令牌
print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))

高級用法

import openai
import json

# 配置客戶端以使用本地vLLM端點
client = openai.OpenAI(
    base_url="http://localhost:8000/v1",  # 默認vLLM服務器URL
    api_key="empty"  # 可以是任何字符串
)

# 定義工具/函數
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The unit of temperature to return"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# 創建聊天完成
response = client.chat.completions.create(
    model="Salesforce/xLAM-2-1b-fc-r",  # 模型名稱無關緊要，vLLM使用服務的模型
    messages=[
        {"role": "system", "content": "You are a helpful assistant that can use tools."},
        {"role": "user", "content": "What's the weather like in San Francisco?"}
    ],
    tools=tools,
    tool_choice="auto"
)

# 打印響應
print("Assistant's response:")
print(json.dumps(response.model_dump(), indent=2))

📚 詳細文檔

模型系列

模型	總參數數量	上下文長度	類別	下載模型	下載GGUF文件
Llama-xLAM-2-70b-fc-r	70B	128k	多輪對話、函數調用	鏈接	NA
Llama-xLAM-2-8b-fc-r	8B	128k	多輪對話、函數調用	鏈接	鏈接
xLAM-2-32b-fc-r	32B	32k (最大128k)*	多輪對話、函數調用	鏈接	NA
xLAM-2-3b-fc-r	3B	32k (最大128k)*	多輪對話、函數調用	鏈接	鏈接
xLAM-2-1b-fc-r	1B	32k (最大128k)*	多輪對話、函數調用	鏈接	鏈接

*注意：基於Qwen-2.5的模型默認上下文長度為32k，但可以使用YaRN等技術實現最大128k的上下文長度。更多詳細信息請參考此處。

基準測試結果

Berkeley Function-Calling Leaderboard (BFCL v3)

BFCL結果不同模型在BFCL排行榜上的性能比較。排名基於整體準確率，這是不同評估類別的加權平均值。“FC”表示函數調用模式，與使用自定義“提示”提取函數調用相對。

œÑ-bench基準測試

Tau-bench結果在œÑ-bench基準測試中，至少進行5次試驗的平均成功率（pass@1）。我們的xLAM-2-70b-fc-r模型在œÑ-bench上的整體成功率達到56.2%，顯著優於基礎Llama 3.1 70B Instruct模型（38.2%）和其他開源模型，如DeepSeek v3（40.6%）。值得注意的是，我們的最佳模型甚至優於專有模型，如GPT-4o（52.9%），並接近更近期模型，如Claude 3.5 Sonnet (new)（60.1%）的性能。

Pass^k曲線衡量給定任務的所有5次獨立試驗成功的概率，在œÑ-retail（左）和œÑ-airline（右）領域的所有任務上進行平均。值越高表示模型的一致性越好。

🔧 技術細節

模型生成細節

該模型使用llama.cpp在提交版本6adc3c3e時生成。

量化方法

嘗試了一種新的量化方法，該方法選擇性地提高關鍵層的精度，超越了默認IMatrix配置提供的精度。在測試中，標準IMatrix量化在低比特深度下表現不佳，特別是對於混合專家（MoE）模型。為了解決這個問題，使用llama.cpp中的--tensor-type選項手動將重要層的精度提高。具體實現可參考：使用llama.cpp進行層提升。雖然這會增加模型文件的大小，但顯著提高了給定量化級別的精度。

📄 許可證

引用

如果您在工作中使用了我們的模型或數據集，請引用我們的論文：

@article{prabhakar2025apigen,
  title={APIGen-MT: Agentic PIpeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay},
  author={Prabhakar, Akshara and Liu, Zuxin and Zhu, Ming and Zhang, Jianguo and Awalgaonkar, Tulika and Wang, Shiyu and Liu, Zhiwei and Chen, Haolin and Hoang, Thai and others},
  journal={arXiv preprint arXiv:2504.03601},
  year={2025}
}

此外，請查看我們關於xLAM系列的其他優秀相關工作，並考慮也引用它們：

@article{zhang2025actionstudio,
  title={ActionStudio: A Lightweight Framework for Data and Training of Action Models},
  author={Zhang, Jianguo and Hoang, Thai and Zhu, Ming and Liu, Zuxin and Wang, Shiyu and Awalgaonkar, Tulika and Prabhakar, Akshara and Chen, Haolin and Yao, Weiran and Liu, Zhiwei and others},
  journal={arXiv preprint arXiv:2503.22673},
  year={2025}
}

@article{zhang2024xlam,
  title={xLAM: A Family of Large Action Models to Empower AI Agent Systems},
  author={Zhang, Jianguo and Lan, Tian and Zhu, Ming and Liu, Zuxin and Hoang, Thai and Kokane, Shirley and Yao, Weiran and Tan, Juntao and Prabhakar, Akshara and Chen, Haolin and others},
  journal={arXiv preprint arXiv:2409.03215},
  year={2024}
}

@article{liu2024apigen,
  title={Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets},
  author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and RN, Rithesh and others},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={54463--54482},
  year={2024}
}

@article{zhang2024agentohana,
  title={AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning},
  author={Zhang, Jianguo and Lan, Tian and Murthy, Rithesh and Liu, Zhiwei and Yao, Weiran and Tan, Juntao and Hoang, Thai and Yang, Liangwei and Feng, Yihao and Liu, Zuxin and others},
  journal={arXiv preprint arXiv:2402.15506},
  year={2024}
}

其他信息

模型測試邀請

如果您發現這些模型有用，請幫助測試AI驅動的量子網絡監控助手，進行量子就緒安全檢查。量子網絡監控

量子網絡監控服務的完整開源代碼可在GitHub倉庫中找到：量子網絡監控源代碼。如果您想自己進行模型量化，也可以找到相關代碼：GGUFModelBuilder

測試說明

選擇AI助手類型：
- TurboLLM (GPT-4.1-mini)
- HugLLM (Huggingface開源模型)
- TestLLM (僅支持CPU的實驗性模型)
測試內容：
- 對即時網絡服務進行函數調用
- 測試小模型在自動化Nmap安全掃描、量子就緒檢查和網絡監控任務中的性能

不同助手特點

TestLLM：當前的實驗性模型（在Hugging Face Docker空間的2個CPU線程上運行llama.cpp）：
- 零配置設置
- 加載時間約30秒（推理速度慢，但無API成本），無令牌限制
- 尋求合作：如果您對邊緣設備AI感興趣，歡迎合作！
TurboLLM：使用gpt-4.1-mini：
- 性能出色，但OpenAI按令牌收費，因此令牌使用受限
- 創建自定義命令處理器，在量子網絡監控代理上運行.NET代碼
- 即時網絡診斷和監控
- 安全審計
- 滲透測試（Nmap/Metasploit）
HugLLM：最新的開源模型：
- 在Hugging Face推理API上運行，使用Novita託管的最新模型表現出色