xLAM-v0.1-r Open-Source Large Action Model - Fine-tuning with the Same Parameters to Retain Original Power for Diverse Scenarios

Xlam V0.1 R

Developed by Salesforce

xLAM-v0.1 is a major upgrade in the Large Action Model series, fine-tuned across a wide range of agent tasks and scenarios while maintaining the original model's capabilities with the same parameter count.

Large Language Model

Transformers

#Multi-task Agent #Long Context Processing #Function Calling Optimization

Downloads 190

Release Time : 3/18/2024

Model Overview

xLAM-v0.1-r represents version 0.1 of the Large Action Model series, marked for research use. The model is compatible with VLLM and FastChat platforms, supporting function calling and general tasks.

Model Features

Extensive Agent Task Fine-Tuning

Fine-tuned across a wide range of agent tasks and scenarios with the same parameter count, outperforming the original model.

Compatibility with Mainstream Platforms

Compatible with VLLM and FastChat platforms for easy deployment and use.

Long Context Support

Supports context lengths of up to 32k, suitable for handling complex tasks.

Function Calling Capability

Equipped with robust function calling capabilities, ideal for automation tasks and agent scenarios.

Model Capabilities

Text Generation

Function Calling

Long Context Processing

Multi-task Agent

Use Cases

Automation Agents

Automated Task Processing

Utilizes function calling capabilities to automate complex tasks.

Improves task processing efficiency and accuracy.

General Text Generation

Long Text Generation

Generates high-quality long-form content.

Suitable for content creation and report generation.

🚀 xLAM - Large Action Model

xLAM is a significant upgrade to existing models like Mixtral. It has been fine - tuned across a wide range of agent tasks and scenarios, preserving the capabilities of the original model. The xLAM - v0.1 - r version is tagged for research and is compatible with VLLM and FastChat platforms.

xLAM

[AgentOhana Paper] | [Github] | [Discord] | [Homepage] | [Community Demo]

🚀 Quick Start

If you already know Mixtral, xLAM-v0.1 is a significant upgrade and better at many things. For the same number of parameters, the model have been fine - tuned across a wide range of agent tasks and scenarios, all while preserving the capabilities of the original model.

xLAM-v0.1-r represents the version 0.1 of the Large Action Model series, with the "-r" indicating it's tagged for research. This model is compatible with VLLM and FastChat platforms.

Property	Details
Model Type	xLAM-v0.1-r is a large action model, with different variants like xLAM-7b-r, xLAM-8x7b-r, etc.
Training Data	Not specified in the original document.

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Salesforce/xLAM-v0.1-r")
model = AutoModelForCausalLM.from_pretrained("Salesforce/xLAM-v0.1-r", device_map="auto")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")

outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Advanced Usage

You may need to tune the Temperature setting for different applications. Typically, a lower Temperature is helpful for tasks that require deterministic outcomes. Additionally, for tasks demanding adherence to specific formats or function calls, explicitly including formatting instructions is advisable.

📚 Documentation

Ethical Considerations

This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high - risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.

Benchmarks

BOLAA

Webshop

LLM Name	ZS	ZST	ReaAct	PlanAct	PlanReAct	BOLAA
Llama-2-70B-chat	0.0089	0.0102	0.4273	0.2809	0.3966	0.4986
Vicuna-33B	0.1527	0.2122	0.1971	0.3766	0.4032	0.5618
Mixtral-8x7B-Instruct-v0.1	0.4634	0.4592	0.5638	0.4738	0.3339	0.5342
GPT-3.5-Turbo	0.4851	0.5058	0.5047	0.4930	0.5436	0.6354
GPT-3.5-Turbo-Instruct	0.3785	0.4195	0.4377	0.3604	0.4851	0.5811
GPT-4-0613	0.5002	0.4783	0.4616	0.7950	0.4635	0.6129
xLAM-v0.1-r	0.5201	0.5268	0.6486	0.6573	0.6611	0.6556

HotpotQA

LLM Name	ZS	ZST	ReaAct	PlanAct	PlanReAct
Mixtral-8x7B-Instruct-v0.1	0.3912	0.3971	0.3714	0.3195	0.3039
GPT-3.5-Turbo	0.4196	0.3937	0.3868	0.4182	0.3960
GPT-4-0613	0.5801	0.5709	0.6129	0.5778	0.5716
xLAM-v0.1-r	0.5492	0.4776	0.5020	0.5583	0.5030

AgentLite

Please note: All prompts provided by AgentLite are considered "unseen prompts" for xLAM-v0.1-r, meaning the model has not been trained with data related to these prompts.

Webshop

LLM Name	Act	ReAct	BOLAA
GPT-3.5-Turbo-16k	0.6158	0.6005	0.6652
GPT-4-0613	0.6989	0.6732	0.7154
xLAM-v0.1-r	0.6563	0.6640	0.6854

HotpotQA

	Easy		Medium		Hard
LLM Name	F1 Score	Accuracy	F1 Score	Accuracy	F1 Score	Accuracy
GPT-3.5-Turbo-16k-0613	0.410	0.350	0.330	0.25	0.283	0.20
GPT-4-0613	0.611	0.47	0.610	0.480	0.527	0.38
xLAM-v0.1-r	0.532	0.45	0.547	0.46	0.455	0.36

ToolBench

LLM Name	Unseen Insts & Same Set	Unseen Tools & Seen Cat	Unseen Tools & Unseen Cat
TooLlama V2	0.4385	0.4300	0.4350
GPT-3.5-Turbo-0125	0.5000	0.5150	0.4900
GPT-4-0125-preview	0.5462	0.5450	0.5050
xLAM-v0.1-r	0.5077	0.5650	0.5200

MINT-BENCH

LLM Name	1-step	2-step	3-step	4-step	5-step
GPT-4-0613	-	-	-	-	69.45
Claude-Instant-1	12.12	32.25	39.25	44.37	45.90
xLAM-v0.1-r	4.10	28.50	36.01	42.66	43.96
Claude-2	26.45	35.49	36.01	39.76	39.93
Lemur-70b-Chat-v1	3.75	26.96	35.67	37.54	37.03
GPT-3.5-Turbo-0613	2.73	16.89	24.06	31.74	36.18
AgentLM-70b	6.48	17.75	24.91	28.16	28.67
CodeLlama-34b	0.17	16.21	23.04	25.94	28.16
Llama-2-70b-chat	4.27	14.33	15.70	16.55	17.92

Tool-Query

LLM Name	Success Rate	Progress Rate
xLAM-v0.1-r	0.533	0.766
DeepSeek-67B	0.400	0.714
GPT-3.5-Turbo-0613	0.367	0.627
GPT-3.5-Turbo-16k	0.317	0.591
Lemur-70B	0.283	0.720
CodeLlama-13B	0.250	0.525
CodeLlama-34B	0.133	0.600
Mistral-7B	0.033	0.510
Vicuna-13B-16K	0.033	0.343
Llama-2-70B	0.000	0.483

📄 License

This code is licensed under Apache 2.0. For models based on the deepseek model, which require you to follow the use based restrictions in the linked deepseek license. This is a research only project.

Acknowledgement

We want to acknowledge the work which have made contributions to our paper and the agent research community! If you find our work useful, please consider to cite

@article{zhang2024agentohana,
  title={AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning},
  author={Zhang, Jianguo and Lan, Tian and Murthy, Rithesh and Liu, Zhiwei and Yao, Weiran and Tan, Juntao and Hoang, Thai and Yang, Liangwei and Feng, Yihao and Liu, Zuxin and others},
  journal={arXiv preprint arXiv:2402.15506},
  year={2024}
}

@article{liu2024apigen,
  title={APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets},
  author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Kokane, Shirley and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and others},
  journal={arXiv preprint arXiv:2406.18518},
  year={2024}
}

@article{zhang2024xlamfamilylargeaction,
  title={xLAM: A Family of Large Action Models to Empower AI Agent Systems}, 
  author={Zhang, Jianguo  and Lan, Tian  and Zhu, Ming  and Liu, Zuxin and Hoang, Thai and Kokane, Sh

Phi 2 GGUF

Other

Phi-2 is a small yet powerful language model developed by Microsoft, featuring 2.7 billion parameters, focusing on efficient inference and high-quality text generation.

Large Language Model Supports Multiple Languages

A large English language model pre-trained with masked language modeling objectives, using improved BERT training methods

Large Language Model English

FacebookAI