F

Flower Calvin D

Developed by mbreuss
FlowerVLA is a vision-language-action flow model pre-trained on the CALVIN D dataset, employing an efficient flow-matching architecture that achieves general-purpose robot operation strategies with only about 1 billion parameters.
Downloads 16
Release Time : 3/16/2025

Model Overview

FlowerVLA is an innovative vision-language-action flow strategy model designed for robotic manipulation tasks, capable of generating corresponding action outputs based on visual inputs and language instructions.

Model Features

Efficient Architecture
Employs a novel Transformer-based flow-matching architecture, achieving efficient and general-purpose VLA strategies with only about 1 billion parameters
Multimodal Encoding
Utilizes half of Florence-2 modules for multimodal vision-language encoding, effectively integrating visual and linguistic information
High Performance
Ranked first in the CALVIN D challenge, demonstrating outstanding performance

Model Capabilities

Vision-Language-Action Mapping
Robot Operation Control
Multimodal Information Processing

Use Cases

Robotics
Object Grasping
Identify and grasp specific objects based on language instructions
Achieves high success rates on the CALVIN D dataset
Task Sequence Execution
Execute complex multi-step manipulation tasks
Capable of completing long sequence tasks with an average length of 4.36
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase