Nat-base-in1k-224 Open-source Image Classification Model - Precise Image Recognition Based on Neighborhood Attention

Nat Base In1k 224

Developed by shi-labs

NAT-Base is a Vision Transformer model trained on ImageNet-1K, which uses the neighborhood attention mechanism for image classification.

Image Classification

Transformers

OtherOpen Source License:MIT #Neighborhood Attention #Image Classification #Sliding Window Attention

Downloads 6

Release Time : 11/18/2022

Model Overview

NAT is a hierarchical Vision Transformer based on Neighborhood Attention (NA), specifically designed for image classification tasks. Neighborhood Attention is a restricted self-attention mechanism where the receptive field of each token is limited to its nearest neighboring pixels, offering high flexibility and maintaining translational equivariance.

Model Features

Neighborhood Attention Mechanism

It adopts the sliding window attention mode, where the receptive field of each token is limited to its nearest neighboring pixels, maintaining translational equivariance.

Efficient Implementation

The neighborhood attention mechanism is efficiently implemented in PyTorch through the NATTEN library.

Hierarchical Structure

It uses a hierarchical Vision Transformer architecture, suitable for processing visual features at different scales.

Model Capabilities

Image Classification

Visual Feature Extraction

Use Cases

Computer Vision

ImageNet Image Classification

Classify an image into one of the 1,000 ImageNet categories.

🚀 NAT (base variant)

NAT (Neighborhood Attention Transformer) is a hierarchical vision transformer for image classification. It was trained on ImageNet - 1K at 224x224 resolution, offering high - performance image classification capabilities.

✨ Features

Based on Neighborhood Attention (NA), a restricted self - attention pattern that limits each token's receptive field to its nearest neighboring pixels.
Implemented through the PyTorch extension [NATTEN](https://github.com/SHI - Labs/NATTEN/).
Highly flexible and maintains translational equivariance due to its sliding - window attention pattern.

📦 Installation

Other than transformers, this model requires the [NATTEN](https://shi - labs.com/natten) package.

Linux users: You can refer to [shi - labs.com/natten](https://shi - labs.com/natten) for instructions on installing with pre - compiled binaries (just select your torch build to get the correct wheel URL).
All users: You can alternatively use pip install natten to compile on your device, which may take up to a few minutes. Mac users only have the latter option (no pre - compiled binaries). Refer to [NATTEN's GitHub](https://github.com/SHI - Labs/NATTEN/) for more information.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image from the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import AutoImageProcessor, NatForImageClassification
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = AutoImageProcessor.from_pretrained("shi-labs/nat-base-in1k-224")
model = NatForImageClassification.from_pretrained("shi-labs/nat-base-in1k-224")

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

For more examples, please refer to the documentation.

📚 Documentation

Model description

NAT is a hierarchical vision transformer based on Neighborhood Attention (NA). Neighborhood Attention is a restricted self - attention pattern in which each token's receptive field is limited to its nearest neighboring pixels. NA is a sliding - window attention pattern, and as a result is highly flexible and maintains translational equivariance.

NA is implemented in PyTorch implementations through its extension, [NATTEN](https://github.com/SHI - Labs/NATTEN/).

model image

Source

Intended uses & limitations

You can use the raw model for image classification. See the model hub to look for fine - tuned versions on a task that interests you.

BibTeX entry and citation info

@article{hassani2022neighborhood,
    title        = {Neighborhood Attention Transformer},
    author       = {Ali Hassani and Steven Walton and Jiachen Li and Shen Li and Humphrey Shi},
    year         = 2022,
    url          = {https://arxiv.org/abs/2204.07143},
    eprint       = {2204.07143},
    archiveprefix = {arXiv},
    primaryclass = {cs.CV}
}

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご