sam2-hiera-small Open-source Image and Video Segmentation Model - Free Solution for Promptable Visual Segmentation Tasks

Sam2 Hiera Small

Developed by facebook

A foundational model developed by FAIR for solving promptable visual segmentation tasks in images and videos

Image Segmentation Open Source License:Apache-2.0 #Image and Video Segmentation #Promptable Segmentation #Zero-shot Learning

Downloads 12.98k

Release Time : 8/2/2024

Model Overview

SAM 2 is a foundational model for image and video segmentation, supporting interactive segmentation through prompts such as points or boxes.

Model Features

Multimodal Prompt Support

Supports interactive segmentation through various prompt methods such as points and boxes

Image and Video Versatility

The same model can handle both image and video segmentation tasks

Efficient Inference

Supports bfloat16 precision and CUDA acceleration to improve inference efficiency

Model Capabilities

Image Segmentation

Video Segmentation

Interactive Segmentation

Mask Generation

Use Cases

Computer Vision

Image Object Segmentation

Segment specific objects in images using point or box prompts

Generate precise object masks

Video Object Tracking

Track and segment moving objects in video sequences

Generate consistent object masks across consecutive frames

🚀 SAM 2: Segment Anything in Images and Videos

This repository is for SAM 2, a foundation model developed by FAIR to address promptable visual segmentation in both images and videos. For more information, refer to the SAM 2 paper.

The official code is publicly available in this repo.

🚀 Quick Start

This section provides a quick guide on using SAM 2 for image and video prediction.

✨ Features

Promptable Visual Segmentation: SAM 2 can perform segmentation in images and videos based on given prompts.
Foundation Model: A powerful model for various visual segmentation tasks.

📦 Installation

The original document does not provide installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

Image Prediction

import torch
from sam2.sam2_image_predictor import SAM2ImagePredictor

predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2-hiera-small")

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    predictor.set_image(<your_image>)
    masks, _, _ = predictor.predict(<input_prompts>)

Video Prediction

import torch
from sam2.sam2_video_predictor import SAM2VideoPredictor

predictor = SAM2VideoPredictor.from_pretrained("facebook/sam2-hiera-small")

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    state = predictor.init_state(<your_video>)

    # add new prompts and instantly get the output on the same frame
    frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, <your_prompts>):

    # propagate the prompts to get masklets throughout the video
    for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
        ...

Refer to the demo notebooks for more detailed usage.

📚 Documentation

The original document does not have detailed documentation content, so this section is skipped.

🔧 Technical Details

The original document does not contain technical implementation details, so this section is skipped.

📄 License

This project is licensed under the Apache-2.0 license.

Citation

To cite the paper, model, or software, please use the following BibTeX entry:

@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
  journal={arXiv preprint arXiv:2408.00714},
  url={https://arxiv.org/abs/2408.00714},
  year={2024}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご