flower_libero_10 Open-Source Model - Empowering Robotic Manipulation Tasks, Small Parameters with Big Impact!

Flower Libero 10

Developed by mbreuss

FlowerVLA is a pre-trained vision-language-action flow policy model for robotic manipulation tasks, trained on the LIBERO 10 dataset with only 1 billion parameters.

Multimodal Fusion

Safetensors

EnglishOpen Source License:MIT #Robot Operation Control #Vision-Language-Action Flow #LIBERO Fine-tuning

Downloads 14

Release Time : 3/17/2025

Model Overview

FlowerVLA adopts an innovative architecture, utilizing half the parameters of the Florence-2 model for multimodal vision-language encoding and employing a novel Transformer-based flow matching architecture to deliver an efficient and versatile VLA strategy with approximately 1 billion parameters.

Model Features

Efficient Multimodal Encoding

Achieves multimodal vision-language encoding with half the parameters of the Florence-2 model

Flow Matching Architecture

Employs a novel Transformer-based flow matching architecture

Efficient Parameter Scale

Contains only 1 billion parameters, providing an efficient and versatile VLA strategy

High Performance

Achieves high success rates in the LIBERO 10 challenge

Model Capabilities

Vision-Language-Action Model

Robotic Manipulation Tasks

Multimodal Encoding

Flow Matching

Use Cases

Robotic Manipulation

Place Items into Basket

Place alphabet soup and ketchup into the basket

Success Rate 0.9791666666666666

Turn on Stove and Place Moka Pot

Kitchen Scene 3: Turn on stove and place moka pot

Success Rate 0.9791666666666666

Place Black Bowl into Bottom Cabinet Drawer and Close

Kitchen Scene 4: Place black bowl into bottom cabinet drawer and close

Success Rate 1.0

🚀 FlowerVLA - Vision-Language-Action Flow Model finetuned on LIBERO 10

This is a pre - trained FlowerVLA model for robotic manipulation, trained on the LIBERO 10 dataset. Flower is an efficient Vision - Language - Action Flow policy for robot learning, containing only 1B parameters, which offers a practical solution for robotic tasks.

🚀 Quick Start

Check out our full model implementation on Github todo and follow the instructions in the readme to test the model on one of the environments.

obs = {
    "rgb_obs": {
        "rgb_static": static_image,
        "rgb_gripper": gripper_image
    }
}
10 = {"lang_text": "pick up the blue cube"}
action = model.step(obs, 10)

✨ Features

FlowerVLA is a novel architecture that:

Uses half of Florence - 2 for multi - modal vision - language encoding
Employs a novel transformer - based flow matching architecture
Provides an efficient, versatile VLA policy with only ~1B parameters

📚 Documentation

Model Performance

This checkpoint contains weights for the LIBERO 10 challenge and achieves these results:

eval_lh/avg_seq_len success rate 0.9440705180168152 eval_lh/sr_LIVING_ROOM_SCENE2_put_both_the_alphabet_soup_and_the_tomato_sauce_in_the_basket with success 0.9791666666666666 eval_lh/sr_LIVING_ROOM_SCENE2_put_both_the_cream_cheese_box_and_the_butter_in_the_basket with success 1.0 eval_lh/sr_KITCHEN_SCENE3_turn_on_the_stove_and_put_the_moka_pot_on_it with success 0.9791666666666666 eval_lh/sr_KITCHEN_SCENE4_put_the_black_bowl_in_the_bottom_drawer_of_the_cabinet_and_close_it with success 1.0 eval_lh/sr_LIVING_ROOM_SCENE5_put_the_white_mug_on_the_left_plate_and_put_the_yellow_and_white_mug_on_the_right_plate with success 0.9407051282051282 eval_lh/sr_STUDY_SCENE1_pick_up_the_book_and_place_it_in_the_back_compartment_of_the_caddy with success 1.0 eval_lh/sr_LIVING_ROOM_SCENE6_put_the_white_mug_on_the_plate_and_put_the_chocolate_pudding_to_the_right_of_the_plate with success 0.8990384615384616 eval_lh/sr_LIVING_ROOM_SCENE1_put_both_the_alphabet_soup_and_the_cream_cheese_box_in_the_basket with success 1.0 eval_lh/sr_KITCHEN_SCENE8_put_both_moka_pots_on_the_stove with success 0.7403846153846154 eval_lh/sr_KITCHEN_SCENE6_put_the_yellow_and_white_mug_in_the_microwave_and_close_it with success 0.9022435897435898