SAM-ViT-Huge Open-Source Vision Model - Generate High-Quality Object Masks with Input Prompts and Zero-Shot Transfer to New Tasks

Sam Vit Huge

Developed by facebook

SAM is a vision model capable of generating high-quality object masks based on input prompts, supporting zero-shot transfer to new tasks

Image Segmentation

Transformers

OtherOpen Source License:Apache-2.0 #Zero-shot segmentation #Interactive segmentation #High-precision masks

Downloads 324.78k

Release Time : 4/10/2023

Model Overview

Segment Anything Model (SAM) is an advanced image segmentation model that can generate precise object masks based on input prompts such as points or bounding boxes, and can also automatically generate masks for all objects in an image. The model is trained on a large-scale dataset containing 11 million images and 1.1 billion masks, demonstrating strong zero-shot performance.

Model Features

Zero-shot transfer capability

Performs well on new image distributions and tasks without task-specific fine-tuning

Multi-modal prompt support

Accepts various forms of input prompts such as points and bounding boxes to guide segmentation

Large-scale training data

Trained on the SA-1B dataset containing 11 million images and 1.1 billion masks

Efficient architecture design

Three-module design including an image encoder, prompt encoder, and mask decoder

Model Capabilities

Image segmentation

Object mask generation

Prompt-based segmentation

Automatic segmentation

Use Cases

Computer vision

Interactive image editing

Users specify objects by clicking or drawing boxes, and the model generates precise segmentation masks

High-quality object segmentation results

Automatic image annotation

Automatically generates segmentation masks for all objects in an image

Reduces manual annotation workload

Medical imaging

Medical image analysis

Segments organs or lesion areas in CT/MRI scans

Assists in diagnosis and treatment planning

🚀 Segment Anything Model (SAM) - ViT Huge (ViT-H) version

The Segment Anything Model (SAM) can generate high - quality object masks from input prompts like points or boxes. It can generate masks for all objects in an image and shows strong zero - shot performance on various segmentation tasks.

🚀 Quick Start

The main steps to use the SAM model are as follows:

Install the necessary libraries.
Load the model and processor.
Prepare input data such as images and prompts.
Generate masks using the model.

✨ Features

High - Quality Mask Generation: Produces high - quality object masks from input prompts.
Zero - Shot Performance: Demonstrates strong zero - shot performance on a variety of segmentation tasks.
Large - Scale Training: Trained on a dataset of 11 million images and 1.1 billion masks.

📦 Installation

The installation steps are not explicitly provided in the original README. However, you need to install relevant Python libraries such as transformers, PIL, requests, etc. You can use the following command to install the transformers library:

pip install transformers

💻 Usage Examples

Basic Usage

from PIL import Image
import requests
from transformers import SamModel, SamProcessor

model = SamModel.from_pretrained("facebook/sam-vit-huge")
processor = SamProcessor.from_pretrained("facebook/sam-vit-huge")

img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
input_points = [[[450, 600]]] # 2D localization of a window

inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to("cuda")
outputs = model(**inputs)
masks = processor.image_processor.post_process_masks(outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu())
scores = outputs.iou_scores

Advanced Usage

Automatic - Mask - Generation

The model can generate segmentation masks in a "zero - shot" fashion. The following is an example of automatic mask generation:

from transformers import pipeline
generator =  pipeline("mask-generation", device = 0, points_per_batch = 256)
image_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
outputs = generator(image_url, points_per_batch = 256)

To display the image with masks:

import matplotlib.pyplot as plt
from PIL import Image
import numpy as np

def show_mask(mask, ax, random_color=False):
    if random_color:
        color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
    else:
        color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)
    

plt.imshow(np.array(raw_image))
ax = plt.gca()
for mask in outputs["masks"]:
    show_mask(mask, ax=ax, random_color=True)
plt.axis("off")
plt.show()

📚 Documentation

Model Details

The SAM model consists of 3 main modules:

Property	Details
`VisionEncoder`	A VIT - based image encoder. It computes image embeddings using attention on image patches and uses Relative Positional Embedding.
`PromptEncoder`	Generates embeddings for points and bounding boxes.
`MaskDecoder`	A two - ways transformer that performs cross - attention between image embeddings and point embeddings, and vice - versa.
`Neck`	Predicts output masks based on the contextualized masks produced by the `MaskDecoder`.

Usage Notes

Among other arguments to generate masks, you can pass 2D locations of the approximate position of your object of interest, a bounding box wrapping the object of interest (the format should be the x, y coordinates of the top - right and bottom - left points of the bounding box), or a segmentation mask. As of now, passing text as input is not supported by the official model according to the official repository.

📄 License

This model is licensed under the Apache - 2.0 license.

📚 Citation

If you use this model, please use the following BibTeX entry:

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan - Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご