Samvit_large_patch16.sa1b Open-source Image Feature Model - Free Feature Extraction and Fine-tuning Realization

Samvit Large Patch16.sa1b

Developed by timm

Segment-Anything Vision Transformer (SAM ViT) image feature model, which only includes feature extraction and fine-tuning capabilities, without the segmentation head.

Image Segmentation

Transformers

Open Source License:Apache-2.0 #Large-scale image feature extraction #SA-1B pre-training #Segmentation task adaptation

Downloads 124

Release Time : 5/18/2023

Model Overview

This model is a Vision Transformer pre-trained on the SA-1B dataset, primarily used for image feature extraction and fine-tuning tasks, with weights initialized using MAE pre-training weights.

Model Features

Large patch processing

Uses a 16x16 large patch strategy to process 1024x1024 resolution images.

MAE pre-training initialization

Weight initialization employs the MAE (Masked Autoencoder) pre-training strategy.

High computational efficiency

The model's computational load is 1493.9 GMACs, with 2553.8 million activations, making it suitable for large-scale image processing.

Model Capabilities

Image feature extraction

Image classification

Image embedding representation

Use Cases

Computer vision

Image classification

Can be used for image classification tasks by extracting image features and then classifying them.

Image retrieval

Enables similar image retrieval by extracting image embedding features.

🚀 samvit_large_patch16.sa1b

A Segment-Anything Vision Transformer (SAM ViT) image feature model (NOTE: for features and fine-tune, segmentation head not included). Pretrained on SA-1B for segmentation by paper authors w/ initialization from MAE weights.

🚀 Quick Start

This is a Segment-Anything Vision Transformer (SAM ViT) image feature model. It's designed for feature extraction and fine - tuning, without a segmentation head. It was pretrained on the SA - 1B dataset by the paper authors, initialized with MAE weights.

✨ Features

Model Type: Image classification / feature backbone
Model Stats:
- Params (M): 308.3
- GMACs: 1493.9
- Activations (M): 2553.8
- Image size: 1024 x 1024
Papers:
- Segment Anything
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Original: https://github.com/facebookresearch/segment-anything
Pretrain Dataset: SA - 1B

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

Image Classification

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('samvit_large_patch16.sa1b', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Image Embeddings

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'samvit_large_patch16.sa1b',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 256, 64, 64) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 Documentation

Explore the dataset and runtime metrics of this model in timm model results.

📄 License

This project is licensed under the Apache - 2.0 license.

📖 Citation

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご