Samvit_base_patch16.sa1b open-source image feature model - Free implementation of feature extraction and fine-tuning

Samvit Base Patch16.sa1b

Developed by timm

Segment-Anything Vision Transformer (SAM ViT) image feature model, which only includes feature extraction and fine-tuning capabilities, without a segmentation head.

Image Segmentation

Transformers

Open Source License:Apache-2.0 #Image Segmentation Feature Extraction #SA-1B Pretraining #ViT Backbone Network

Downloads 2,756

Release Time : 5/18/2023

Model Overview

This model is an image feature extraction model based on the Vision Transformer (ViT) architecture, primarily used for image classification and feature extraction tasks. It was pretrained by the paper authors on the SA-1B dataset with MAE weight initialization, making it suitable for segmentation tasks.

Model Features

Efficient Feature Extraction

This model focuses on image feature extraction and is suitable for various downstream vision tasks.

Vision Transformer Architecture

Utilizes the advanced Vision Transformer (ViT) architecture, capable of effectively processing high-resolution images.

Large-Scale Pretraining

Pretrained on the SA-1B dataset, offering strong generalization capabilities.

Model Capabilities

Image Feature Extraction

Image Classification

Image Embedding Generation

Use Cases

Computer Vision

Image Classification

Can be used to classify images and identify the main content within them.

Feature Extraction

Can be used to extract image features for downstream tasks.

🚀 samvit_base_patch16.sa1b Model Card

This is a Segment-Anything Vision Transformer (SAM ViT) image feature model (NOTE: for features and fine - tune, segmentation head not included). It is pretrained on SA - 1B for segmentation by the paper authors with initialization from MAE weights.

🚀 Quick Start

The samvit_base_patch16.sa1b model can be used for image classification and extracting image embeddings. You can follow the usage examples below to get started.

✨ Features

Pretrained on SA - 1B for segmentation.
Can be used for image classification and feature extraction.

📦 Installation

The README doesn't provide specific installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('samvit_base_patch16.sa1b', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Advanced Usage

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'samvit_base_patch16.sa1b',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 256, 64, 64) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 Documentation

Model Details

Property	Details
Model Type	Image classification / feature backbone
Params (M)	89.7
GMACs	486.4
Activations (M)	1343.3
Image size	1024 x 1024
Papers	Segment Anything, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Original	https://github.com/facebookresearch/segment-anything
Pretrain Dataset	SA - 1B

Model Comparison

Explore the dataset and runtime metrics of this model in timm model results.

🔧 Technical Details

The README doesn't provide specific technical details, so this section is skipped.

📄 License

This model is released under the Apache - 2.0 license.

📖 Citation

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan - Yen and Doll{'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご