S

Samvit Large Patch16.sa1b

Developed by timm
Segment-Anything Vision Transformer (SAM ViT) image feature model, which only includes feature extraction and fine-tuning capabilities, without the segmentation head.
Downloads 124
Release Time : 5/18/2023

Model Overview

This model is a Vision Transformer pre-trained on the SA-1B dataset, primarily used for image feature extraction and fine-tuning tasks, with weights initialized using MAE pre-training weights.

Model Features

Large patch processing
Uses a 16x16 large patch strategy to process 1024x1024 resolution images.
MAE pre-training initialization
Weight initialization employs the MAE (Masked Autoencoder) pre-training strategy.
High computational efficiency
The model's computational load is 1493.9 GMACs, with 2553.8 million activations, making it suitable for large-scale image processing.

Model Capabilities

Image feature extraction
Image classification
Image embedding representation

Use Cases

Computer vision
Image classification
Can be used for image classification tasks by extracting image features and then classifying them.
Image retrieval
Enables similar image retrieval by extracting image embedding features.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase