Nat-small-in1k-224 Open-source Image Classification Model - Free Deployment to Accurately Complete Image Classification Tasks

Nat Small In1k 224

Developed by shi-labs

NAT-Small is a hierarchical vision transformer based on neighborhood attention, designed for image classification tasks.

Image Classification

Transformers

OtherOpen Source License:MIT #Neighborhood Attention Mechanism #Image Classification #Sliding Window Attention

Downloads 6

Release Time : 11/18/2022

Model Overview

NAT is a hierarchical vision transformer based on neighborhood attention (NA), which adopts a restricted self-attention mechanism. The receptive field of each token is limited to its nearest neighboring pixels, with high flexibility and translational equivariance.

Model Features

Neighborhood Attention Mechanism

Adopts a sliding window attention mode, where each token only focuses on its nearest neighboring pixels, achieving local feature extraction while maintaining computational efficiency.

Translational Equivariance

Through the neighborhood attention design, the model maintains equivariance to image translation.

Hierarchical Structure

Adopts a hierarchical vision transformer architecture, suitable for processing visual features at different scales.

Model Capabilities

Image Classification

Visual Feature Extraction

Use Cases

Computer Vision

ImageNet Classification

Classify images into 1,000 categories of ImageNet.

Object Recognition

Identify the main object categories in the image.

🚀 NAT (small variant)

NAT-Small is a vision model trained on ImageNet-1K at a 224x224 resolution. It offers efficient image classification capabilities based on the Neighborhood Attention mechanism.

🚀 Quick Start

You can use the raw model for image classification. Check out the model hub to find fine - tuned versions for tasks that interest you.

Example

Here's how to classify an image from the COCO 2017 dataset into one of the 1,000 ImageNet classes using this model:

from transformers import AutoImageProcessor, NatForImageClassification
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = AutoImageProcessor.from_pretrained("shi-labs/nat-small-in1k-224")
model = NatForImageClassification.from_pretrained("shi-labs/nat-small-in1k-224")

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

For more examples, refer to the documentation.

Requirements

Apart from transformers, this model requires the NATTEN package.

If you're using Linux, you can refer to shi-labs.com/natten for instructions on installing with pre - compiled binaries (just select your torch build to get the correct wheel URL).
You can also use pip install natten to compile on your device, which may take up to a few minutes. Mac users only have this option (no pre - compiled binaries). Refer to [NATTEN's GitHub](https://github.com/SHI - Labs/NATTEN/) for more information.

✨ Features

Hierarchical Vision Transformer: NAT is a hierarchical vision transformer based on Neighborhood Attention (NA).
Restricted Self - Attention: Neighborhood Attention restricts each token's receptive field to its nearest neighboring pixels, offering a sliding - window attention pattern.
Flexibility and Translational Equivariance: NA is highly flexible and maintains translational equivariance.
PyTorch Implementation: NA is implemented in PyTorch through its extension, [NATTEN](https://github.com/SHI - Labs/NATTEN/).

📚 Documentation

Model description

NAT is a hierarchical vision transformer based on Neighborhood Attention (NA). Neighborhood Attention is a restricted self - attention pattern where each token's receptive field is limited to its nearest neighboring pixels. NA is a sliding - window attention pattern, making it highly flexible and maintaining translational equivariance.

NA is implemented in PyTorch implementations through its extension, [NATTEN](https://github.com/SHI - Labs/NATTEN/).

model image

Source

Intended uses & limitations

You can use the raw model for image classification. See the model hub to look for fine - tuned versions on a task that interests you.

BibTeX entry and citation info

@article{hassani2022neighborhood,
	title        = {Neighborhood Attention Transformer},
	author       = {Ali Hassani and Steven Walton and Jiachen Li and Shen Li and Humphrey Shi},
	year         = 2022,
	url          = {https://arxiv.org/abs/2204.07143},
	eprint       = {2204.07143},
	archiveprefix = {arXiv},
	primaryclass = {cs.CV}
}

📄 License

This project is licensed under the MIT license.

Property	Details
Model Type	NAT (small variant)
Training Data	ImageNet - 1K

Widget Examples

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご