Open-source sam-vit-base Vision Model - Generate object masks with input prompts, supporting zero-shot segmentation tasks

Sam Vit Base

Developed by facebook

SAM is a vision model capable of generating high-quality object masks from input prompts (such as points or boxes), supporting zero-shot segmentation tasks

Image Segmentation

Transformers

OtherOpen Source License:Apache-2.0 #Zero-shot segmentation #Prompt-based mask generation #Automatic object segmentation

Downloads 635.09k

Release Time : 4/19/2023

Model Overview

Segment Anything Model (SAM) is an advanced image segmentation model that can generate high-quality object masks from simple input prompts (such as points or boxes). The model was trained on a large-scale dataset containing 11 million images and 1.1 billion masks, demonstrating strong zero-shot performance.

Model Features

Zero-shot segmentation capability

Achieves high-quality segmentation on new image distributions and tasks without additional training

Multi-prompt support

Supports segmentation through various forms of prompts such as points and bounding boxes

Large-scale training data

Trained on a dataset containing 11 million images and 1.1 billion masks

Automatic mask generation

Capable of automatically generating masks for all objects in an image without manual prompts

Model Capabilities

Image segmentation

Object mask generation

Zero-shot transfer

Interactive segmentation

Use Cases

Computer vision

Interactive image editing

Quickly select objects in an image with simple point or box prompts

Generates high-quality object masks

Automatic image analysis

Automatically detects and segments all objects in an image

Completes segmentation of complex scenes without human intervention

Medical imaging

Medical image segmentation

Used for segmenting organs or lesion areas in medical images such as CT/MRI

🚀 Model Card for Segment Anything Model (SAM) - ViT Base (ViT-B) version

The Segment Anything Model (SAM) is a powerful tool for image segmentation. It can generate high - quality object masks from input prompts like points or boxes, and is capable of generating masks for all objects in an image. Trained on a large dataset, it shows strong zero - shot performance on various segmentation tasks.

🚀 Quick Start

The Segment Anything Model (SAM) can be quickly utilized for image segmentation tasks. You can start by following the usage examples below to generate masks for your images.

✨ Features

High - Quality Mask Generation: Produces high - quality object masks from input prompts such as points or boxes.
Zero - Shot Performance: Demonstrates strong zero - shot performance on a variety of segmentation tasks.
Large - Scale Training: Trained on a dataset of 11 million images and 1.1 billion masks.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

Prompted - Mask - Generation

from PIL import Image
import requests
from transformers import SamModel, SamProcessor

model = SamModel.from_pretrained("facebook/sam-vit-base")
processor = SamProcessor.from_pretrained("facebook/sam-vit-base")

img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
input_points = [[[450, 600]]] # 2D localization of a window

inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to("cuda")
outputs = model(**inputs)
masks = processor.image_processor.post_process_masks(outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu())
scores = outputs.iou_scores

Among other arguments to generate masks, you can pass 2D locations on the approximate position of your object of interest, a bounding box wrapping the object of interest (the format should be x, y coordinate of the top right and bottom left point of the bounding box), a segmentation mask. At this time of writing, passing a text as input is not supported by the official model according to the official repository. For more details, refer to this notebook, which shows a walk throught of how to use the model, with a visual example!

Automatic - Mask - Generation

from transformers import pipeline
generator =  pipeline("mask-generation", device = 0, points_per_batch = 256)
image_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
outputs = generator(image_url, points_per_batch = 256)

import matplotlib.pyplot as plt
from PIL import Image
import numpy as np

def show_mask(mask, ax, random_color=False):
    if random_color:
        color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
    else:
        color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)
    

plt.imshow(np.array(raw_image))
ax = plt.gca()
for mask in outputs["masks"]:
    show_mask(mask, ax=ax, random_color=True)
plt.axis("off")
plt.show()

📚 Documentation

Model Details

The SAM model is made up of 3 modules:

The VisionEncoder: a VIT based image encoder. It computes the image embeddings using attention on patches of the image. Relative Positional Embedding is used.
The PromptEncoder: generates embeddings for points and bounding boxes
The MaskDecoder: a two - ways transformer which performs cross attention between the image embedding and the point embeddings (->) and between the point embeddings and the image embeddings. The outputs are fed
The Neck: predicts the output masks based on the contextualized masks produced by the MaskDecoder.

Citation

If you use this model, please use the following BibTeX entry.

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan - Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご