๐ Executable Code Actions Elicit Better LLM Agents
This project proposes using executable Python code to consolidate LLM agentsโ actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions based on new observations.
๐ Quick Start
The project provides various resources for users to explore:
โจ Features
CodeAct Concept
We propose to use executable Python code to consolidate LLM agentsโ actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations (e.g., code execution results) through multi-turn interactions.

Why CodeAct?
Our extensive analysis of 17 LLMs on API - Bank and a newly curated benchmark M3ToolEval shows that CodeAct outperforms widely used alternatives like Text and JSON (up to 20% higher success rate).
Comparison between CodeAct and Text / JSON as action.
Quantitative results comparing CodeAct and {Text, JSON} on M3ToolEval.
CodeActInstruct
We collect an instruction - tuning dataset CodeActInstruct that consists of 7k multi - turn interactions using CodeAct. The dataset is released at huggingface dataset ๐ค.
Dataset Statistics. Token statistics are computed using Llama - 2 tokenizer.
CodeActAgent
Trained on CodeActInstruct and general conversations, CodeActAgent excels at out - of - domain agent tasks compared to open - source models of the same size, while not sacrificing generic performance (e.g., knowledge, dialog). We release two variants of CodeActAgent:
- CodeActAgent - Mistral - 7b - v0.1 (recommended, model link): using Mistral - 7b - v0.1 as the base model with 32k context window.
- CodeActAgent - Llama - 7b (model link): using Llama - 2 - 7b as the base model with 4k context window.
Evaluation results for CodeActAgent. ID and OD stand for in - domain and out - of - domain evaluation correspondingly. Overall averaged performance normalizes the MT - Bench score to be consistent with other tasks and excludes in - domain tasks for fair comparison.
๐ Documentation
Please check out our paper and code for more details about data collection, model training, and evaluation.
๐ License
The project is under the Apache - 2.0 license.
๐ Citation
@misc{wang2024executable,
title={Executable Code Actions Elicit Better LLM Agents},
author={Xingyao Wang and Yangyi Chen and Lifan Yuan and Yizhe Zhang and Yunzhu Li and Hao Peng and Heng Ji},
year={2024},
eprint={2402.01030},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Property |
Details |
Pipeline Tag |
Text Generation |
Tags |
LLM - Agent |
License |
Apache - 2.0 |
Datasets |
xingyaoww/code - act |