ConvNeXT-small-224 Open-Source Image Classification Model - Surpasses Traditional Transformers in Performance, Free to Use

Convnext Small 224

Developed by facebook

ConvNeXT is a pure convolutional model inspired by vision transformers, trained on the ImageNet-1k dataset, outperforming traditional vision transformers.

Image Classification

Transformers

Open Source License:Apache-2.0 #Pure Convolutional Architecture #Image Classification #Modern Design

Downloads 586

Release Time : 3/2/2022

Model Overview

ConvNeXT is a modern convolutional neural network designed for image classification tasks, enhancing performance by incorporating design principles from vision transformers.

Model Features

Modern Convolutional Design

Starting from ResNet, it modernizes the convolutional neural network architecture by borrowing design concepts from Swin Transformer.

Outperforms Vision Transformers

While maintaining a pure convolutional structure, it claims to outperform vision transformer models.

Trained on ImageNet-1k

The model is trained on the standard ImageNet-1k dataset, suitable for general image classification tasks.

Model Capabilities

Image Classification

Visual Feature Extraction

Use Cases

Computer Vision

General Image Classification

Classify images into one of the 1,000 categories in ImageNet

High-accuracy classification results

Object Recognition

Identify the main objects in an image

🚀 ConvNeXT (large-sized model)

A ConvNeXT model trained on ImageNet-1k at a resolution of 224x224. It offers a new approach to image classification, inspired by Vision Transformers.

🚀 Quick Start

You can use the raw model for image classification. Check out the model hub to find fine - tuned versions for tasks that interest you.

✨ Features

Innovative Design: ConvNeXT is a pure convolutional model (ConvNet) inspired by Vision Transformers, aiming to outperform them.
Modernized Architecture: The authors started from a ResNet and "modernized" its design, taking the Swin Transformer as inspiration.

📦 Installation

No specific installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import ConvNextImageProcessor, ConvNextForImageClassification
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

processor = ConvNextImageProcessor.from_pretrained("facebook/convnext-small-224")
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-small-224")

inputs = processor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# model predicts one of the 1000 ImageNet classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label]),

For more code examples, we refer to the documentation.

📚 Documentation

Model description

ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them. The authors started from a ResNet and "modernized" its design by taking the Swin Transformer as inspiration.

model image

Intended uses & limitations

You can use the raw model for image classification. See the model hub to look for fine - tuned versions on a task that interests you.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2201-03545,
  author    = {Zhuang Liu and
               Hanzi Mao and
               Chao{-}Yuan Wu and
               Christoph Feichtenhofer and
               Trevor Darrell and
               Saining Xie},
  title     = {A ConvNet for the 2020s},
  journal   = {CoRR},
  volume    = {abs/2201.03545},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.03545},
  eprinttype = {arXiv},
  eprint    = {2201.03545},
  timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2201-03545.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 License

The model is licensed under the Apache - 2.0 license.

Additional Information

Property	Details
Model Type	ConvNeXT (large - sized model)
Training Data	ImageNet - 1k
Tags	vision, image - classification
Widget Examples	src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg, example_title: Tiger src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/teapot.jpg, example_title: Teapot src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/palace.jpg, example_title: Palace

⚠️ Important Note

The team releasing ConvNeXT did not write a model card for this model so this model card has been written by the Hugging Face team.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご