🚀 微調視覺 - 語言 - 動作模型:優化速度與成功率
本倉庫包含適用於LIBERO - Goal的OpenVLA - OFT檢查點,相關內容詳見論文微調視覺 - 語言 - 動作模型:優化速度與成功率。OpenVLA - OFT通過採用優化的微調技術,相較於基礎的OpenVLA模型有顯著提升。
項目頁面:https://openvla - oft.github.io/
代碼倉庫:https://github.com/openvla - oft/openvla - oft
其他OpenVLA - OFT檢查點請見:https://huggingface.co/moojink?search_models=oft
🚀 快速開始
此示例展示瞭如何使用預訓練的OpenVLA - OFT檢查點生成動作塊。請確保你已按照GitHub README中的說明設置好conda環境。
基礎用法
import pickle
from experiments.robot.libero.run_libero_eval import GenerateConfig
from experiments.robot.openvla_utils import get_action_head, get_processor, get_proprio_projector, get_vla, get_vla_action
from prismatic.vla.constants import NUM_ACTIONS_CHUNK, PROPRIO_DIM
cfg = GenerateConfig(
pretrained_checkpoint = "moojink/openvla-7b-oft-finetuned-libero-spatial",
use_l1_regression = True,
use_diffusion = False,
use_film = False,
num_images_in_input = 2,
use_proprio = True,
load_in_8bit = False,
load_in_4bit = False,
center_crop = True,
num_open_loop_steps = NUM_ACTIONS_CHUNK,
unnorm_key = "libero_spatial_no_noops",
)
vla = get_vla(cfg)
processor = get_processor(cfg)
action_head = get_action_head(cfg, llm_dim=vla.llm_dim)
proprio_projector = get_proprio_projector(cfg, llm_dim=vla.llm_dim, proprio_dim=PROPRIO_DIM)
with open("experiments/robot/libero/sample_libero_spatial_observation.pkl", "rb") as file:
observation = pickle.load(file)
actions = get_vla_action(cfg, vla, processor, observation, observation["task_description"], action_head, proprio_projector)
print("Generated action chunk:")
for act in actions:
print(act)
📄 許可證
本項目採用MIT許可證。
📚 引用
@article{kim2025fine,
title={Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success},
author={Kim, Moo Jin and Finn, Chelsea and Liang, Percy},
journal={arXiv preprint arXiv:2502.19645},
year={2025}
}
屬性 |
詳情 |
模型類型 |
視覺 - 語言 - 動作模型 |
倉庫標籤 |
機器人技術 |
庫名稱 |
transformers |
許可證 |
MIT |