OpenVLA 7B Vision-Language-Action Model - Fine-tuned on LIBERO-Spatial, Open Source and Practical!

Openvla 7b Finetuned Libero Spatial

Developed by openvla

OpenVLA 7B vision-language-action model fine-tuned with LoRA on the LIBERO-Spatial dataset

EnglishOpen Source License:MIT #Robot Vision-Language Control #Multimodal Action Generation #Simulation Environment Pretraining

Downloads 4,009

Release Time : 9/3/2024

Model Overview

This is a multimodal vision-language-action model designed for robotics, capable of processing image and text inputs to generate corresponding action commands.

Model Features

LIBERO-Spatial Dataset Fine-Tuning

Model performance optimized specifically for robotic spatial tasks

LoRA Efficient Fine-Tuning

Parameter-efficient fine-tuning using LoRA with rank=32, adapting to new tasks while preserving original model capabilities

Multimodal Processing Capability

Capable of processing both visual and language inputs to output action commands

Model Capabilities

Vision-Language Understanding

Robotic Action Generation

Multimodal Reasoning

Spatial Task Processing

Use Cases

Robotic Control

Spatial Navigation Task

Generates robotic navigation actions based on visual input and text instructions

Performs well on the LIBERO-Spatial benchmark

Object Manipulation Task

Completes object grasping and placement tasks by combining visual and language inputs

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Openvla 7b Finetuned Libero Spatial

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 OpenVLA 7B Fine-Tuned on LIBERO-Spatial

🚀 Quick Start

🔧 Technical Details

📚 Documentation

📄 License

📖 Citation