SciWorld-MPO Open-source Intelligent Model - Free Deployment to Enhance the Planning and Decision-making Abilities of Agents

Sciworld MPO

Developed by xwm

A reinforcement learning model fine-tuned based on Llama-3.1-8B-Instruct, utilizing meta plan optimization technology to enhance agent planning capabilities

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Meta Plan Optimization #Agent Planning #Task Execution Feedback

Downloads 96

Release Time : 2/17/2025

Model Overview

This model provides high-level general guidance through meta-planning and continuously optimizes based on feedback from agent task execution, demonstrating excellent performance in ALFWorld and SciWorld benchmarks

Model Features

Meta Plan Optimization Technology

Utilizes MPO technology to enhance the planning capabilities of large language model agents

High-Performance Benchmarking

Achieves an average accuracy of 83.1% in ALFWorld and SciWorld benchmarks

Feedback-Driven Optimization

Continuously optimizes based on feedback from agent task execution

Model Capabilities

Agent Planning Optimization

Meta Plan Generation

Task Execution Feedback Analysis

Reinforcement Learning Decision Making

Use Cases

Agent Development

Virtual Assistant Planning Optimization

Enhances the planning capabilities of virtual assistants in complex tasks

Demonstrates excellent performance in ALFWorld benchmarks

Scientific Experiment Planning

Optimizes the planning process for scientific experiment steps

Achieves high accuracy in SciWorld benchmarks

🚀 SciWorld-MPO

This model is a fine - tuned version of Llama - 3.1 - 8B - Instruct, aiming to enhance the planning capabilities of LLM agents through Meta Plan Optimization (MPO).

🚀 Quick Start

This model is a fine - tuned version of Llama-3.1-8B-Instruct on the sciworld-metaplan-preference-pairs dataset. It achieves the following results on the evaluation set:

Loss: 1.5017
Rewards/chosen: -3.8774
Rewards/rejected: -5.1594
Rewards/accuracies: 0.6419
Rewards/margins: 1.2820
Logps/chosen: -92.4593
Logps/rejected: -109.6343
Logits/chosen: 0.5212
Logits/rejected: 0.5151

See the original paper for more details: MPO: Boosting LLM Agents with Meta Plan Optimization.

Code: https://github.com/WeiminXiong/MPO

✨ Features

This model uses Meta Plan Optimization (MPO) to improve the planning capabilities of LLM agents. It leverages high - level general guidance through meta plans and enables continuous optimization based on feedback from the agent's task execution. It achieves state - of - the - art performance on ALFWorld and SciWorld, with an average accuracy of 83.1.

📚 Documentation

Intended uses & limitations

More information needed

Training and evaluation data

The model was trained on the sciworld-metaplan-preference-pairs dataset, part of the Meta_Plan_Optimization dataset.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 1
seed: 42
distributed_type: multi - GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 4
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon = 1e-08 and optimizer_args = No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 3.0

Training results

Framework versions

Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

📄 License

This project is licensed under the Apache-2.0 license.

Property	Details
Library Name	transformers
Pipeline Tag	reinforcement - learning
Datasets	xwm/Meta_Plan_Optimization
Base Model	meta - llama/Llama - 3.1 - 8B - Instruct
Metrics	accuracy
Tags	nlp, agent

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご