đ Segment Anything Model (SAM) - ViT Huge (ViT-H) version
The Segment Anything Model (SAM) can generate high - quality object masks from input prompts like points or boxes. It can generate masks for all objects in an image and shows strong zero - shot performance on various segmentation tasks.
đ Quick Start
The main steps to use the SAM model are as follows:
- Install the necessary libraries.
- Load the model and processor.
- Prepare input data such as images and prompts.
- Generate masks using the model.
⨠Features
- High - Quality Mask Generation: Produces high - quality object masks from input prompts.
- Zero - Shot Performance: Demonstrates strong zero - shot performance on a variety of segmentation tasks.
- Large - Scale Training: Trained on a dataset of 11 million images and 1.1 billion masks.
đĻ Installation
The installation steps are not explicitly provided in the original README. However, you need to install relevant Python libraries such as transformers
, PIL
, requests
, etc. You can use the following command to install the transformers
library:
pip install transformers
đģ Usage Examples
Basic Usage
from PIL import Image
import requests
from transformers import SamModel, SamProcessor
model = SamModel.from_pretrained("facebook/sam-vit-huge")
processor = SamProcessor.from_pretrained("facebook/sam-vit-huge")
img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
input_points = [[[450, 600]]]
inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to("cuda")
outputs = model(**inputs)
masks = processor.image_processor.post_process_masks(outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu())
scores = outputs.iou_scores
Advanced Usage
Automatic - Mask - Generation
The model can generate segmentation masks in a "zero - shot" fashion. The following is an example of automatic mask generation:
from transformers import pipeline
generator = pipeline("mask-generation", device = 0, points_per_batch = 256)
image_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
outputs = generator(image_url, points_per_batch = 256)
To display the image with masks:
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
def show_mask(mask, ax, random_color=False):
if random_color:
color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
else:
color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
h, w = mask.shape[-2:]
mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
ax.imshow(mask_image)
plt.imshow(np.array(raw_image))
ax = plt.gca()
for mask in outputs["masks"]:
show_mask(mask, ax=ax, random_color=True)
plt.axis("off")
plt.show()
đ Documentation
Model Details
The SAM model consists of 3 main modules:
Property |
Details |
VisionEncoder |
A VIT - based image encoder. It computes image embeddings using attention on image patches and uses Relative Positional Embedding. |
PromptEncoder |
Generates embeddings for points and bounding boxes. |
MaskDecoder |
A two - ways transformer that performs cross - attention between image embeddings and point embeddings, and vice - versa. |
Neck |
Predicts output masks based on the contextualized masks produced by the MaskDecoder . |
Usage Notes
Among other arguments to generate masks, you can pass 2D locations of the approximate position of your object of interest, a bounding box wrapping the object of interest (the format should be the x, y coordinates of the top - right and bottom - left points of the bounding box), or a segmentation mask. As of now, passing text as input is not supported by the official model according to the official repository.
đ License
This model is licensed under the Apache - 2.0 license.
đ Citation
If you use this model, please use the following BibTeX entry:
@article{kirillov2023segany,
title={Segment Anything},
author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan - Yen and Doll{\'a}r, Piotr and Girshick, Ross},
journal={arXiv:2304.02643},
year={2023}
}