đ Model Card for SlimSAM (compressed version of SAM = Segment Anything)
SlimSAM is a compressed version of the Segment Anything (SAM) model. It can generate high - quality object masks from input prompts like points or boxes, offering a more efficient alternative for resource - constrained devices.
đ Quick Start
SlimSAM is a compressed (pruned) version of the Segment Anything (SAM) model. It can produce high - quality object masks from input prompts such as points or boxes.
The abstract of the paper shows that the large model size and high computational requirements of SAM make it difficult to deploy on resource - constrained devices. SlimSAM is a novel SAM compression method that achieves good performance with low training costs through a unified pruning - distillation framework.
Link to original repository
Disclaimer: Content from this model card has been written by the Hugging Face team, and parts of it were copy - pasted from the original SAM model card.
⨠Features
- Compression Advantage: SlimSAM is a compressed version of SAM, which can significantly reduce training costs while maintaining good performance.
- High - Quality Mask Generation: It can generate high - quality object masks from input prompts like points or boxes.
đĻ Installation
No installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
Prompted - Mask - Generation
from PIL import Image
import requests
from transformers import SamModel, SamProcessor
model = SamModel.from_pretrained("nielsr/slimsam-77-uniform")
processor = SamProcessor.from_pretrained("nielsr/slimsam-77-uniform")
img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
input_points = [[[450, 600]]]
inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to("cuda")
outputs = model(**inputs)
masks = processor.image_processor.post_process_masks(outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu())
scores = outputs.iou_scores
Among other arguments to generate masks, you can pass 2D locations on the approximate position of your object of interest, a bounding box wrapping the object of interest (the format should be x, y coordinate of the top right and bottom left point of the bounding box), a segmentation mask. At this time of writing, passing a text as input is not supported by the official model according to the official repository.
For more details, refer to this notebook, which shows a walk - through of how to use the model, with a visual example!
Automatic - Mask - Generation
from transformers import pipeline
generator = pipeline(task="mask-generation", model="nielsr/slimsam-77-uniform", device = 0, points_per_batch = 256)
image_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
outputs = generator(image_url, points_per_batch = 256)
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
def show_mask(mask, ax, random_color=False):
if random_color:
color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
else:
color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
h, w = mask.shape[-2:]
mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
ax.imshow(mask_image)
plt.imshow(np.array(raw_image))
ax = plt.gca()
for mask in outputs["masks"]:
show_mask(mask, ax=ax, random_color=True)
plt.axis("off")
plt.show()
đ Documentation
Model Details
The SAM model consists of 3 modules:
- The
VisionEncoder
: a VIT - based image encoder. It computes the image embeddings using attention on patches of the image. Relative Positional Embedding is used.
- The
PromptEncoder
: generates embeddings for points and bounding boxes
- The
MaskDecoder
: a two - ways transformer which performs cross - attention between the image embedding and the point embeddings (->) and between the point embeddings and the image embeddings. The outputs are fed
- The
Neck
: predicts the output masks based on the contextualized masks produced by the MaskDecoder
.
đ License
The model is licensed under the Apache 2.0 license.
đ Citation
If you use this model, please use the following BibTeX entry.
@article{kirillov2023segany,
title={Segment Anything},
author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan - Yen and Doll{\'a}r, Piotr and Girshick, Ross},
journal={arXiv:2304.02643},
year={2023}
}
@misc{chen202301,
title={0.1% Data Makes Segment Anything Slim},
author={Zigeng Chen and Gongfan Fang and Xinyin Ma and Xinchao Wang},
year={2023},
eprint={2312.05284},
archivePrefix={arXiv},
primaryClass={cs.CV}
}