NAT-Mini Open-Source Vision Model - A Lightweight Tool for Free Use in ImageNet Image Classification Tasks

Nat Mini In1k 224

Developed by shi-labs

NAT-Mini is a lightweight vision Transformer model based on neighborhood attention mechanism, designed for ImageNet image classification tasks

Image Classification

Transformers

OtherOpen Source License:MIT #Neighborhood Attention #Image Classification #Lightweight Transformer

Downloads 109

Release Time : 11/15/2022

Model Overview

NAT is a hierarchical vision Transformer based on Neighborhood Attention, achieving efficient image classification through constrained self-attention patterns

Model Features

Neighborhood Attention Mechanism

Uses constrained self-attention patterns where each token's receptive field is limited to nearest neighboring pixels, preserving translation equivariance

Efficient Architecture

Hierarchical vision Transformer design that reduces computational complexity while maintaining performance

Flexible Implementation

Implemented in PyTorch through the NATTEN extension library, supporting sliding window attention patterns

Model Capabilities

Image Classification

Visual Feature Extraction

Use Cases

Computer Vision

ImageNet Image Classification

Classifies images into 1000 ImageNet categories

Accuracy metrics not provided

🚀 NAT (mini variant)

NAT-Mini is trained on ImageNet-1K at a resolution of 224x224. It offers a novel approach to image classification using Neighborhood Attention.

🚀 Quick Start

You can use the raw model for image classification. Check out the model hub to find fine - tuned versions for tasks that interest you.

Example

Here's how to classify an image from the COCO 2017 dataset into one of the 1,000 ImageNet classes using this model:

from transformers import AutoImageProcessor, NatForImageClassification
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = AutoImageProcessor.from_pretrained("shi-labs/nat-mini-in1k-224")
model = NatForImageClassification.from_pretrained("shi-labs/nat-mini-in1k-224")

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

For more examples, refer to the documentation.

Requirements

Apart from transformers, this model requires the NATTEN package.

If you're on Linux, you can refer to shi-labs.com/natten for instructions on installing with pre - compiled binaries (just select your torch build to get the correct wheel URL).

Alternatively, you can use pip install natten to compile on your device, which may take up to a few minutes. Mac users only have the latter option (no pre - compiled binaries).

For more information, refer to NATTEN's GitHub.

✨ Features

Neighborhood Attention: NAT is a hierarchical vision transformer based on Neighborhood Attention (NA). NA is a restricted self - attention pattern where each token's receptive field is limited to its nearest neighboring pixels. It is a sliding - window attention pattern, which is highly flexible and maintains translational equivariance.
PyTorch Implementation: NA is implemented in PyTorch through its extension, NATTEN.

📚 Documentation

Model description

NAT is a hierarchical vision transformer based on Neighborhood Attention (NA). Neighborhood Attention is a restricted self - attention pattern in which each token's receptive field is limited to its nearest neighboring pixels. NA is a sliding - window attention pattern, and as a result, it is highly flexible and maintains translational equivariance.

NA is implemented in PyTorch implementations through its extension, NATTEN.

model image

Source

BibTeX entry and citation info

@article{hassani2022neighborhood,
	title        = {Neighborhood Attention Transformer},
	author       = {Ali Hassani and Steven Walton and Jiachen Li and Shen Li and Humphrey Shi},
	year         = 2022,
	url          = {https://arxiv.org/abs/2204.07143},
	eprint       = {2204.07143},
	archiveprefix = {arXiv},
	primaryclass = {cs.CV}
}

📄 License

This project is licensed under the MIT license.

📦 Additional Information

Property	Details
Model Type	NAT (mini variant) for image classification
Training Data	ImageNet - 1K
Tags	vision, image - classification

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご