Flower Calvin D
F
Flower Calvin D
Developed by mbreuss
FlowerVLA is a vision-language-action flow model pre-trained on the CALVIN D dataset, employing an efficient flow-matching architecture that achieves general-purpose robot operation strategies with only about 1 billion parameters.
Downloads 16
Release Time : 3/16/2025
Model Overview
FlowerVLA is an innovative vision-language-action flow strategy model designed for robotic manipulation tasks, capable of generating corresponding action outputs based on visual inputs and language instructions.
Model Features
Efficient Architecture
Employs a novel Transformer-based flow-matching architecture, achieving efficient and general-purpose VLA strategies with only about 1 billion parameters
Multimodal Encoding
Utilizes half of Florence-2 modules for multimodal vision-language encoding, effectively integrating visual and linguistic information
High Performance
Ranked first in the CALVIN D challenge, demonstrating outstanding performance
Model Capabilities
Vision-Language-Action Mapping
Robot Operation Control
Multimodal Information Processing
Use Cases
Robotics
Object Grasping
Identify and grasp specific objects based on language instructions
Achieves high success rates on the CALVIN D dataset
Task Sequence Execution
Execute complex multi-step manipulation tasks
Capable of completing long sequence tasks with an average length of 4.36
Featured Recommended AI Models