Swin-tiny-patch4-window7-224 Open Source Image Classification Model - Efficiently Complete Image Classification Tasks

Swin Tiny Patch4 Window7 224

Developed by microsoft

Swin Transformer is a hierarchical vision Transformer that achieves linear computational complexity by computing self-attention within local windows, making it suitable for image classification tasks.

Image Classification

Transformers

Open Source License:Apache-2.0 #Hierarchical Vision Transformer #Local Window Attention #Image Classification Backbone

Downloads 98.00k

Release Time : 3/2/2022

Model Overview

This model is a tiny version based on the Swin Transformer architecture, trained on the ImageNet-1k dataset for image classification tasks. It employs a hierarchical design and shifted window mechanism to effectively reduce computational complexity.

Model Features

Hierarchical Design

Constructs hierarchical feature maps by progressively merging image patches, suitable for processing visual features at different scales.

Shifted Window Mechanism

Computes self-attention only within local windows, making the computational complexity linear with respect to input image size.

Efficient Computation

Significantly reduces computational complexity compared to traditional vision Transformers while maintaining high performance.

Model Capabilities

Image Classification

Visual Feature Extraction

Use Cases

Computer Vision

General Image Classification

Classifies input images into one of the 1000 ImageNet categories.

Achieves good performance on the ImageNet-1k dataset.

Visual Feature Extraction

Serves as a backbone network to extract image features for downstream vision tasks.

🚀 Swin Transformer (tiny-sized model)

A Swin Transformer model trained on ImageNet-1k at a resolution of 224x224, offering efficient image classification capabilities.

🚀 Quick Start

The Swin Transformer model presented here is trained on the ImageNet-1k dataset at a resolution of 224x224. It was introduced in the paper Swin Transformer: Hierarchical Vision Transformer using Shifted Windows by Liu et al. and first released in this repository.

Disclaimer: The team releasing Swin Transformer did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Model description

The Swin Transformer is a type of Vision Transformer. It constructs hierarchical feature maps by merging image patches (shown in gray) in deeper layers. Due to the computation of self - attention only within each local window (shown in red), it has linear computation complexity with respect to the input image size. As a result, it can serve as a general - purpose backbone for both image classification and dense recognition tasks. In contrast, previous vision Transformers generate feature maps of a single low resolution and have quadratic computation complexity with respect to the input image size because of global self - attention computation.

model image

Source

Intended uses & limitations

You can utilize the raw model for image classification. Check out the model hub to find fine - tuned versions for tasks that interest you.

💻 Usage Examples

Basic Usage

Here's how to use this model to classify an image from the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = AutoImageProcessor.from_pretrained("microsoft/swin-tiny-patch4-window7-224")
model = AutoModelForImageClassification.from_pretrained("microsoft/swin-tiny-patch4-window7-224")

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

For more code examples, refer to the documentation.

📚 Documentation

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2103-14030,
  author    = {Ze Liu and
               Yutong Lin and
               Yue Cao and
               Han Hu and
               Yixuan Wei and
               Zheng Zhang and
               Stephen Lin and
               Baining Guo},
  title     = {Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  journal   = {CoRR},
  volume    = {abs/2103.14030},
  year      = {2021},
  url       = {https://arxiv.org/abs/2103.14030},
  eprinttype = {arXiv},
  eprint    = {2103.14030},
  timestamp = {Thu, 08 Apr 2021 07:53:26 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2103-14030.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 License

This model is released under the Apache - 2.0 license.

Property	Details
Model Type	Swin Transformer (tiny - sized model)
Training Data	ImageNet - 1k
Tags	vision, image - classification
Widget Examples	Tiger, Teapot, Palace

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご