xLAM-v0.1-r开源大型动作模型 - 同参微调保留原力适用多样场景

首页

Xlam V0.1 R

由 Salesforce 开发

xLAM-v0.1是大型动作模型系列的重大升级版本，在参数数量相同的情况下，已在广泛的代理任务和场景中进行了微调，同时保留了原始模型的能力。

大型语言模型

Transformers

#多任务代理 #长上下文处理 #函数调用优化

下载量 190

发布时间 : 3/18/2024

模型简介

xLAM-v0.1-r代表大型动作模型系列的0.1版本，标记为研究用途。该模型兼容VLLM和FastChat平台，支持函数调用和通用任务。

模型特点

广泛代理任务微调

在参数数量相同的情况下，已在广泛的代理任务和场景中进行了微调，表现优于原始模型。

兼容主流平台

兼容VLLM和FastChat平台，便于部署和使用。

长上下文支持

支持长达32k的上下文长度，适合处理复杂任务。

函数调用能力

具备强大的函数调用能力，适合自动化任务和代理场景。

模型能力

文本生成

函数调用

长上下文处理

多任务代理

使用案例

自动化代理

自动化任务处理

利用函数调用能力自动化处理复杂任务。

提高任务处理效率和准确性。

通用文本生成

长文本生成

生成高质量的长文本内容。

适用于内容创作和报告生成。

🚀 xLAM-v0.1-r大动作模型

xLAM-v0.1-r是大动作模型系列的0.1版本，是在Mixtral基础上的显著升级。该模型在相同参数数量下，针对广泛的代理任务和场景进行了微调，同时保留了原模型的能力，并且兼容VLLM和FastChat平台。

xLAM

[AgentOhana论文] | [Github] | [Discord] | [主页] | [社区演示]

🚀 快速开始

如果你已经了解Mixtral，那么xLAM-v0.1是一个显著的升级版本，在很多方面表现更优。对于相同数量的参数，该模型在广泛的代理任务和场景中进行了微调，同时保留了原模型的能力。

xLAM-v0.1-r代表大动作模型系列的0.1版本，“-r” 表示该版本用于研究。此模型与VLLM和FastChat平台兼容。

模型	总参数数量	上下文长度	发布日期	类别	下载模型	下载GGUF文件
xLAM-7b-r	72.4亿	32k	2024年9月5日	通用，函数调用	🤗 链接	--
xLAM-8x7b-r	467亿	32k	2024年9月5日	通用，函数调用	🤗 链接	--
xLAM-8x22b-r	1410亿	64k	2024年9月5日	通用，函数调用	🤗 链接	--
xLAM-1b-fc-r	13.5亿	16k	2024年7月17日	函数调用	🤗 链接	🤗 链接
xLAM-7b-fc-r	69.1亿	4k	2024年7月17日	函数调用	🤗 链接	🤗 链接
xLAM-v0.1-r	467亿	32k	2024年3月18日	通用，函数调用	🤗 链接	--

💻 使用示例

基础用法

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Salesforce/xLAM-v0.1-r")
model = AutoModelForCausalLM.from_pretrained("Salesforce/xLAM-v0.1-r", device_map="auto")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")

outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

高级用法

你可能需要针对不同的应用调整温度设置。通常，较低的温度有助于需要确定性结果的任务。此外，对于要求遵循特定格式或函数调用的任务，建议明确包含格式说明。

⚠️ 伦理考量

本次发布仅用于支持学术论文的研究目的。我们的模型、数据集和代码并非专门为所有下游用途设计或评估。我们强烈建议用户在部署此模型之前，评估并解决与准确性、安全性和公平性相关的潜在问题。我们鼓励用户考虑人工智能的常见局限性，遵守适用法律，并在选择用例时采用最佳实践，特别是在错误或滥用可能对人们的生活、权利或安全产生重大影响的高风险场景中。有关用例的更多指导，请参考我们的AUP和AI AUP。

📊 基准测试

BOLAA

Webshop

LLM名称	零样本(ZS)	带思维链的零样本(ZST)	ReaAct	PlanAct	PlanReAct	BOLAA
Llama-2-70B-chat	0.0089	0.0102	0.4273	0.2809	0.3966	0.4986
Vicuna-33B	0.1527	0.2122	0.1971	0.3766	0.4032	0.5618
Mixtral-8x7B-Instruct-v0.1	0.4634	0.4592	0.5638	0.4738	0.3339	0.5342
GPT-3.5-Turbo	0.4851	0.5058	0.5047	0.4930	0.5436	0.6354
GPT-3.5-Turbo-Instruct	0.3785	0.4195	0.4377	0.3604	0.4851	0.5811
GPT-4-0613	0.5002	0.4783	0.4616	0.7950	0.4635	0.6129
xLAM-v0.1-r	0.5201	0.5268	0.6486	0.6573	0.6611	0.6556

HotpotQA

LLM名称	零样本(ZS)	带思维链的零样本(ZST)	ReaAct	PlanAct	PlanReAct
Mixtral-8x7B-Instruct-v0.1	0.3912	0.3971	0.3714	0.3195	0.3039
GPT-3.5-Turbo	0.4196	0.3937	0.3868	0.4182	0.3960
GPT-4-0613	0.5801	0.5709	0.6129	0.5778	0.5716
xLAM-v0.1-r	0.5492	0.4776	0.5020	0.5583	0.5030

AgentLite

请注意：AgentLite提供的所有提示对于xLAM-v0.1-r来说都是“未见提示”，这意味着模型没有使用与这些提示相关的数据进行训练。

Webshop

LLM名称	Act	ReAct	BOLAA
GPT-3.5-Turbo-16k	0.6158	0.6005	0.6652
GPT-4-0613	0.6989	0.6732	0.7154
xLAM-v0.1-r	0.6563	0.6640	0.6854

HotpotQA

LLM名称	简单难度F1分数	简单难度准确率	中等难度F1分数	中等难度准确率	困难难度F1分数	困难难度准确率
GPT-3.5-Turbo-16k-0613	0.410	0.350	0.330	0.25	0.283	0.20
GPT-4-0613	0.611	0.47	0.610	0.480	0.527	0.38
xLAM-v0.1-r	0.532	0.45	0.547	0.46	0.455	0.36

ToolBench

LLM名称	未见指令与相同工具集	未见工具与已知类别	未见工具与未知类别
TooLlama V2	0.4385	0.4300	0.4350
GPT-3.5-Turbo-0125	0.5000	0.5150	0.4900
GPT-4-0125-preview	0.5462	0.5450	0.5050
xLAM-v0.1-r	0.5077	0.5650	0.5200

MINT-BENCH

LLM名称	1步	2步	3步	4步	5步
GPT-4-0613	-	-	-	-	69.45
Claude-Instant-1	12.12	32.25	39.25	44.37	45.90
xLAM-v0.1-r	4.10	28.50	36.01	42.66	43.96
Claude-2	26.45	35.49	36.01	39.76	39.93
Lemur-70b-Chat-v1	3.75	26.96	35.67	37.54	37.03
GPT-3.5-Turbo-0613	2.73	16.89	24.06	31.74	36.18
AgentLM-70b	6.48	17.75	24.91	28.16	28.67
CodeLlama-34b	0.17	16.21	23.04	25.94	28.16

Tool-Query

LLM名称	成功率	进度率
xLAM-v0.1-r	0.533	0.766
DeepSeek-67B	0.400	0.714
GPT-3.5-Turbo-0613	0.367	0.627
GPT-3.5-Turbo-16k	0.317	0.591
Lemur-70B	0.283	0.720
CodeLlama-13B	0.250	0.525
CodeLlama-34B	0.133	0.600
Mistral-7B	0.033	0.510
Vicuna-13B-16K	0.033	0.343
Llama-2-70B	0.000	0.483

📄 许可证

此代码遵循Apache 2.0许可证。对于基于deepseek模型的模型，你需要遵循链接的deepseek许可证中的使用限制。这是一个仅用于研究的项目。

🙏 致谢

我们要感谢那些为我们的论文和代理研究社区做出贡献的工作！如果你发现我们的工作有用，请考虑引用以下文献：

@article{zhang2024agentohana,
  title={AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning},
  author={Zhang, Jianguo and Lan, Tian and Murthy, Rithesh and Liu, Zhiwei and Yao, Weiran and Tan, Juntao and Hoang, Thai and Yang, Liangwei and Feng, Yihao and Liu, Zuxin and others},
  journal={arXiv preprint arXiv:2402.15506},
  year={2024}
}

@article{liu2024apigen,
  title={APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets},
  author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Kokane, Shirley and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and others},
  journal={arXiv preprint arXiv:2406.18518},
  year={2024}
}

@article{zhang2024xlamfamilylargeaction,
  title={xLAM: A Family of Large Action Models to Empower AI Agent Systems}, 
  author={Zhang, Jianguo  and Lan, Tian  and Zhu, Ming  and Liu, Zuxin and Hoang, Thai and Kokane, Shirley and Yao, Weiran and Tan, Juntao and Prabhakar, Akshara and Chen, Haolin and Liu, Zhiwei and Feng, Yihao and Awalgaonkar, Tulika and Murthy, Rithesh and Hu, Eric and Chen, Zeyuan and Xu, Ran and Niebles, Juan Carlos and Heinecke, Shelby and Wang, Huan and Savarese, Silvio and Xiong, Caiming},
  journal={arXiv preprint arXiv:2409.03215},
  year={2024}
}