đ ConvNeXT (base-sized model)
A ConvNeXT model pre - trained on ImageNet - 22k and fine - tuned on ImageNet - 1k at resolution 384x384. It offers a new approach to image classification by combining convolutional design with modern architecture concepts.
đ Quick Start
The ConvNeXT model is a powerful tool for image classification. It was introduced in the paper A ConvNet for the 2020s by Liu et al. and first released in this repository.
Disclaimer: The team releasing ConvNeXT did not write a model card for this model so this model card has been written by the Hugging Face team.
⨠Features
- Innovative Design: ConvNeXT is a pure convolutional model (ConvNet) inspired by Vision Transformers, aiming to outperform them.
- Modernized Architecture: The authors started from a ResNet and "modernized" its design by taking the Swin Transformer as inspiration.

đ Documentation
Intended uses & limitations
You can use the raw model for image classification. See the model hub to look for fine - tuned versions on a task that interests you.
đģ Usage Examples
Basic Usage
Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:
from transformers import ConvNextImageProcessor, ConvNextForImageClassification
import torch
from datasets import load_dataset
dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]
processor = ConvNextImageProcessor.from_pretrained("facebook/convnext-base-384-22k-1k")
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-base-384-22k-1k")
inputs = processor(image, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label]),
For more code examples, we refer to the documentation.
đ§ Technical Details
ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers. The authors started from a ResNet and "modernized" its design by taking the Swin Transformer as inspiration, claiming to outperform Vision Transformers.
đ License
This model is licensed under the Apache - 2.0 license.
Property |
Details |
Model Type |
ConvNet |
Training Data |
ImageNet - 22k, ImageNet - 1k |
BibTeX entry and citation info
@article{DBLP:journals/corr/abs-2201-03545,
author = {Zhuang Liu and
Hanzi Mao and
Chao{-}Yuan Wu and
Christoph Feichtenhofer and
Trevor Darrell and
Saining Xie},
title = {A ConvNet for the 2020s},
journal = {CoRR},
volume = {abs/2201.03545},
year = {2022},
url = {https://arxiv.org/abs/2201.03545},
eprinttype = {arXiv},
eprint = {2201.03545},
timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2201-03545.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}