FlowerVLA Open-source Vision-Language-Action Model - Empowering Robot Manipulation with CALVIN ABC Training

Flower Calvin Abc

Developed by mbreuss

FlowerVLA is a pre-trained vision-language-action model for robotic manipulation tasks, trained on the CALVIN ABC dataset, utilizing an efficient flow matching architecture with approximately 1 billion parameters.

Multimodal Fusion

Safetensors

EnglishOpen Source License:MIT #Robot Operation Control #Multimodal Flow Matching #Efficient with Small Parameters

Downloads 20

Release Time : 3/16/2025

Model Overview

FlowerVLA is an efficient vision-language-action flow policy model specifically designed for robotic manipulation tasks, combining multimodal vision-language encoding and a novel Transformer architecture.

Model Features

Efficient Multimodal Encoding

Employs half of the Florence-2 model structure for multimodal vision-language encoding, achieving efficient vision-language fusion.

Flow Matching Architecture

Uses a novel Transformer-based flow matching architecture to optimize action generation processes.

Lightweight Design

With only about 1 billion parameters, it achieves efficient and versatile vision-language-action policies suitable for real-time robotic operations.

Model Capabilities

Vision-Language-Action Fusion

Execution of Robotic Manipulation Tasks

Multimodal Input Processing

Action Space Prediction

Use Cases

Robotics

CALVIN ABC Challenge

Performing complex robotic manipulation tasks in the CALVIN ABC challenge

Currently ranked first with an average task completion length of 4.54

Object Grasping

Grasping specific objects based on language instructions

High success rate

Train→Test	Method	1	2	3	4	5	Avg. Len.
CALVIN ABC	FlowerVLA	99.3%	95.9%	90.5%	84.8%	77.5%	4.54

Property	Details
Base Model	microsoft/Florence-2-large, mbreuss/flower_vla_pret
Pipeline Tag	robotics
Tags	robotics, VLA

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Flower Calvin Abc

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 FlowerVLA - Vision-Language-Action Flow Model for CALVIN ABC

📚 Documentation

Model Description

Model Performance

Input/Output Specifications

Inputs

Outputs

💻 Usage Examples

Basic Usage

Advanced Usage

🔧 Technical Details

Configuration

📄 License

Citation