DiNAT-Mini Open-source Vision Model - Free Deployment for Efficient Image Classification Tasks

Dinat Mini In1k 224

Developed by shi-labs

DiNAT-Mini is a hierarchical vision Transformer model based on neighborhood attention mechanism, specifically designed for image classification tasks.

Image Classification

Transformers

Open Source License:MIT #Sliding Window Attention #Image Classification #Neighborhood Attention

Downloads 462

Release Time : 11/14/2022

Model Overview

This model employs Dilated Neighborhood Attention (DiNA) and is trained on the ImageNet-1K dataset, suitable for image classification tasks at 224x224 resolution.

Model Features

Neighborhood Attention Mechanism

Uses a constrained self-attention mechanism where each token's receptive field is limited to its nearest neighboring pixels, preserving translation equivariance.

Dilated Neighborhood Attention

Extends the receptive field through dilated variants (DiNA), forming a flexible sliding window attention pattern.

Hierarchical Structure

Adopts a hierarchical vision Transformer architecture, suitable for processing visual features at different scales.

Model Capabilities

Image Classification

Visual Feature Extraction

Use Cases

Computer Vision

ImageNet Image Classification

Classifies input images into one of 1000 ImageNet categories

🚀 DiNAT (mini variant)

DiNAT-Mini is trained on ImageNet-1K at 224x224 resolution. It offers a hierarchical vision transformer solution for image classification.

🚀 Quick Start

DiNAT-Mini is a model trained on ImageNet-1K at a resolution of 224x224. It was introduced in the paper Dilated Neighborhood Attention Transformer by Hassani et al. and first released in this repository.

✨ Features

DiNAT is a hierarchical vision transformer based on Neighborhood Attention (NA) and its dilated variant (DiNA).
Neighborhood Attention restricts each token's receptive field to its nearest neighboring pixels, resulting in a sliding - window attention pattern that is highly flexible and maintains translational equivariance.
It comes with PyTorch implementations through the NATTEN package.

📚 Documentation

Model description

DiNAT is a hierarchical vision transformer based on Neighborhood Attention (NA) and its dilated variant (DiNA). Neighborhood Attention is a restricted self - attention pattern in which each token's receptive field is limited to its nearest neighboring pixels. NA and DiNA are therefore sliding - window attention patterns, and as a result are highly flexible and maintain translational equivariance.

They come with PyTorch implementations through the NATTEN package.

model image

Source

Intended uses & limitations

You can use the raw model for image classification. See the model hub to look for fine - tuned versions on a task that interests you.

Example

Here is how to use this model to classify an image from the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import AutoImageProcessor, DinatForImageClassification
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = AutoImageProcessor.from_pretrained("shi-labs/dinat-mini-in1k-224")
model = DinatForImageClassification.from_pretrained("shi-labs/dinat-mini-in1k-224")

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

For more examples, please refer to the documentation.

Requirements

Other than transformers, this model requires the NATTEN package.

If you're on Linux, you can refer to shi-labs.com/natten for instructions on installing with pre - compiled binaries (just select your torch build to get the correct wheel URL).

You can alternatively use pip install natten to compile on your device, which may take up to a few minutes. Mac users only have the latter option (no pre - compiled binaries).

Refer to NATTEN's GitHub for more information.

BibTeX entry and citation info

@article{hassani2022dilated,
    title        = {Dilated Neighborhood Attention Transformer},
    author       = {Ali Hassani and Humphrey Shi},
    year         = 2022,
    url          = {https://arxiv.org/abs/2209.15001},
    eprint       = {2209.15001},
    archiveprefix = {arXiv},
    primaryclass = {cs.CV}
}

📄 License

This model is licensed under the MIT license.

Property	Details
Model Type	Hierarchical vision transformer based on Neighborhood Attention (NA) and its dilated variant (DiNA)
Training Data	ImageNet - 1K

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご