slimsam-77-uniform Open-source Segmentation Model - Compressed SAM with High Performance in a Small Package

Slimsam 77 Uniform

Developed by nielsr

SlimSAM is a compressed version of the Segment Anything (SAM) model, significantly reducing the model size through pruning and distillation techniques while maintaining high performance.

Image Segmentation

Transformers

OtherOpen Source License:Apache-2.0 #Image Segmentation Compression #Zero-shot Learning #Pruning and Distillation

Downloads 625

Release Time : 1/7/2024

Model Overview

SlimSAM is a compressed version based on the Segment Anything Model (SAM), capable of generating high-quality object masks from input prompts like points or boxes. Through an innovative pruning-distillation framework, it achieves performance close to the original SAM with extremely low training costs.

Model Features

Efficient Compression

Achieves model compression through a unified pruning-distillation framework, reducing parameters to 0.9% and computation to 0.8% of the original SAM.

Low-cost Training

Requires only 0.1% of the original SAM's training data (10,000 images), with training costs more than 10 times lower than existing methods.

Alternate Slimming Strategy

Employs an innovative alternating pruning and distillation method to progressively compress the model structure and enhance knowledge inheritance.

Model Capabilities

Prompt-based image segmentation

Automatic mask generation

Object recognition

Use Cases

Computer Vision

Interactive Image Editing

Quickly select specific objects in an image using point or box prompts

Generates high-quality object masks

Automated Image Annotation

Automatically generates masks for all objects in an image

Generates segmentation masks in a zero-shot manner

🚀 Model Card for SlimSAM (compressed version of SAM = Segment Anything)

SlimSAM is a compressed version of the Segment Anything (SAM) model. It can generate high - quality object masks from input prompts like points or boxes, offering a more efficient alternative for resource - constrained devices.

🚀 Quick Start

SlimSAM is a compressed (pruned) version of the Segment Anything (SAM) model. It can produce high - quality object masks from input prompts such as points or boxes.

The abstract of the paper shows that the large model size and high computational requirements of SAM make it difficult to deploy on resource - constrained devices. SlimSAM is a novel SAM compression method that achieves good performance with low training costs through a unified pruning - distillation framework.

Link to original repository

Disclaimer: Content from this model card has been written by the Hugging Face team, and parts of it were copy - pasted from the original SAM model card.

✨ Features

Compression Advantage: SlimSAM is a compressed version of SAM, which can significantly reduce training costs while maintaining good performance.
High - Quality Mask Generation: It can generate high - quality object masks from input prompts like points or boxes.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

Prompted - Mask - Generation

from PIL import Image
import requests
from transformers import SamModel, SamProcessor

model = SamModel.from_pretrained("nielsr/slimsam-77-uniform")
processor = SamProcessor.from_pretrained("nielsr/slimsam-77-uniform")

img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
input_points = [[[450, 600]]] # 2D localization of a window

inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to("cuda")
outputs = model(**inputs)
masks = processor.image_processor.post_process_masks(outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu())
scores = outputs.iou_scores

Among other arguments to generate masks, you can pass 2D locations on the approximate position of your object of interest, a bounding box wrapping the object of interest (the format should be x, y coordinate of the top right and bottom left point of the bounding box), a segmentation mask. At this time of writing, passing a text as input is not supported by the official model according to the official repository. For more details, refer to this notebook, which shows a walk - through of how to use the model, with a visual example!

Automatic - Mask - Generation

from transformers import pipeline
generator =  pipeline(task="mask-generation", model="nielsr/slimsam-77-uniform", device = 0, points_per_batch = 256)
image_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
outputs = generator(image_url, points_per_batch = 256)

import matplotlib.pyplot as plt
from PIL import Image
import numpy as np

def show_mask(mask, ax, random_color=False):
    if random_color:
        color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
    else:
        color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)
    

plt.imshow(np.array(raw_image))
ax = plt.gca()
for mask in outputs["masks"]:
    show_mask(mask, ax=ax, random_color=True)
plt.axis("off")
plt.show()

📚 Documentation

Model Details

The SAM model consists of 3 modules:

The VisionEncoder: a VIT - based image encoder. It computes the image embeddings using attention on patches of the image. Relative Positional Embedding is used.
The PromptEncoder: generates embeddings for points and bounding boxes
The MaskDecoder: a two - ways transformer which performs cross - attention between the image embedding and the point embeddings (->) and between the point embeddings and the image embeddings. The outputs are fed
The Neck: predicts the output masks based on the contextualized masks produced by the MaskDecoder.

📄 License

The model is licensed under the Apache 2.0 license.

📚 Citation

If you use this model, please use the following BibTeX entry.

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan - Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}
@misc{chen202301,
      title={0.1% Data Makes Segment Anything Slim}, 
      author={Zigeng Chen and Gongfan Fang and Xinyin Ma and Xinchao Wang},
      year={2023},
      eprint={2312.05284},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご