Open-source Theia (theia-base-patch16-224-cddsv) Model - A Visual Representation Marvel for Robot Learning

Theia Base Patch16 224 Cddsv

Developed by theaiinstitute

Theia is a vision foundation model for robot learning, enriched with visual representation capabilities through the distillation of multiple vision foundation models

3D Vision

Transformers

Open Source License:Other #Robot Vision #Multi-task Distillation #Lightweight Backbone

Downloads 5,404

Release Time : 9/30/2024

Model Overview

Theia is a specialized vision model for robot learning, distilled from multiple vision foundation models, enhancing the performance of downstream robot learning tasks. Experiments show it outperforms existing models with less training data and a smaller model size.

Model Features

Multi-model Distillation

Simultaneously distills knowledge from five vision foundation models: CLIP, Depth Anything, DINOv2, Segment Anything, and ViT

Efficient Learning

Outperforms teacher models with less training data and a smaller model size

Diverse Visual Representations

Encodes rich visual knowledge suitable for various robot learning tasks

Model Capabilities

Visual Feature Extraction

Depth Estimation

Image Segmentation

Visual Representation Learning

Use Cases

Robot Learning

Robot Visual Navigation

Utilizes rich visual representations to assist robots in environmental understanding and navigation

Achieves better performance than traditional models with limited training data

Object Recognition and Manipulation

Combines various visual knowledge for object recognition and manipulation tasks

🚀 Theia

Theia is a vision foundation model for robot learning, distilling multiple off - the - shelf vision foundation models trained on various vision tasks. Its rich visual representations enhance downstream robot learning.

🚀 Quick Start

Theia is a vision foundation model for robot learning that distills multiple off - the - shelf vision foundation models trained on varied vision tasks. Theia’s rich visual representations encode diverse visual knowledge, enhancing downstream robot learning. It was introduced in the paper Theia: Distilling Diverse Vision Foundation Models for Robot Learning, which also includes experiments demonstrating that Theia outperforms its teacher models and prior robot learning models using less training data and smaller model sizes. Demo videos can be found on the project page.

✨ Features

The theia - tiny - patch16 - 224 - cddsv model uses [DeiT - Tiny](https://huggingface.co/facebook/deit - tiny - patch16 - 224) as a backbone, and simultaneously distills CLIP, [Depth Anything](https://github.com/LiheYoung/Depth - Anything), DINOv2, [Segment Anything](https://github.com/facebookresearch/segment - anything) and [ViT](https://github.com/google - research/vision_transformer). For more information on usage, please visit the Theia repository.

📚 Documentation

Citation

If you use Theia in your research, please use the following BibTeX entry:

@article{shang2024theia,
  author    = {Shang, Jinghuan and Schmeckpeper, Karl and May, Brandon B. and Minniti, Maria Vittoria and Kelestemur, Tarik and Watkins, David and Herlant, Laura},
  title     = {Theia: Distilling Diverse Vision Foundation Models for Robot Learning},
  journal   = {arXiv},
  year      = {2024},
}

Usage

The pre - trained model weights and code released with Theia are available for use under The AI Institute License, reproduced in full below:

Copyright (c) 2024 Boston Dynamics AI Institute LLC

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the copyright notice included
with the software, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the copyright notice, this
list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.
3. Modified versions of the software must be conspicuously marked as such.
4. The software may only be used for non - commercial research purposes.
For profit enterprises may use the software, subject to this limitation.

THIS SOFTWARE IS PROVIDED BY THE AI INSTITUTE AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, NON -
INFRINGEMENT,TITLE, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE AI INSTITUTE OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, PUNITIVE OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, DAMAGES ARISING OUT OF CLAIMS OF
INTELLECTUAL PROPERTY RIGHTS INFRINGEMENT; PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

📄 License

The library transformers is under the other license. The pre - trained model weights and code of Theia are available for use under The AI Institute License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご