Open-source image feature model samvit_huge_patch16.sa1b - Free feature extraction and fine-tuning available!

Samvit Huge Patch16.sa1b

Developed by timm

Segment-Anything Vision Transformer (SAM ViT) image feature model, containing only feature extraction and fine-tuning capabilities, without the segmentation head.

Image Segmentation

Transformers

Open Source License:Apache-2.0 #SA-1B pre-training #ViT backbone network #1024 high resolution

Downloads 131

Release Time : 5/18/2023

Model Overview

This model is a Vision Transformer pre-trained on the SA-1B dataset, primarily used for image feature extraction and fine-tuning, especially suitable for segmentation tasks.

Model Features

Large-scale pre-training

Pre-trained on the SA-1B dataset, with powerful feature extraction capabilities

Efficient architecture

Utilizes Vision Transformer (ViT) architecture to process 1024x1024 resolution images

Versatile applications

Can be used for both image classification and as a feature extraction backbone network

Model Capabilities

Image feature extraction

Image classification

Image embedding generation

Use Cases

Computer vision

Image classification

Use this model for image classification tasks

Feature extraction

Serve as a backbone network to extract image features for downstream tasks

🚀 Model card for samvit_huge_patch16.sa1b

A Segment-Anything Vision Transformer (SAM ViT) image feature model (NOTE: for features and fine-tune, segmentation head not included). Pretrained on SA-1B for segementation by paper authors w/ initialization from MAE weights.

🚀 Quick Start

This is a Segment-Anything Vision Transformer (SAM ViT) image feature model. It's designed for feature extraction and fine - tuning, without a segmentation head. It has been pretrained on the SA - 1B dataset for segmentation.

✨ Features

Model Type: Image classification / feature backbone
Model Stats:
- Params (M): 637.0
- GMACs: 2982.2
- Activations (M): 3428.2
- Image size: 1024 x 1024
Papers:
- Segment Anything
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Original: https://github.com/facebookresearch/segment-anything
Pretrain Dataset: SA - 1B

💻 Usage Examples

Basic Usage

Image Classification

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('samvit_huge_patch16.sa1b', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Advanced Usage

Image Embeddings

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'samvit_huge_patch16.sa1b',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 256, 64, 64) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 Documentation

Explore the dataset and runtime metrics of this model in timm model results.

📄 License

This model is licensed under the Apache - 2.0 license.

📚 Citation

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご