đ ConvNeXT (xlarge-sized model)
A ConvNeXT model trained on ImageNet-22k at a resolution of 224x224, aiming to achieve high - performance in image classification.
đ Quick Start
The ConvNeXT model is a powerful tool for image classification tasks. You can use the raw model for basic image classification, or explore fine - tuned versions on the model hub according to your specific needs.
⨠Features
- Innovative Design: Inspired by Vision Transformers, ConvNeXT modernizes the ResNet design, aiming to outperform traditional convolutional models.
- High - Resolution Training: Trained on ImageNet - 22k at a resolution of 224x224, enabling it to handle complex image classification tasks.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:
from transformers import ConvNextFeatureExtractor, ConvNextForImageClassification
import torch
from datasets import load_dataset
dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]
feature_extractor = ConvNextFeatureExtractor.from_pretrained("facebook/convnext-xlarge-224-22k")
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-xlarge-224-22k")
inputs = feature_extractor(image, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label]),
For more code examples, we refer to the documentation.
đ Documentation
Model description
ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them. The authors started from a ResNet and "modernized" its design by taking the Swin Transformer as inspiration.

Intended uses & limitations
You can use the raw model for image classification. See the model hub to look for fine - tuned versions on a task that interests you.
BibTeX entry and citation info
@article{DBLP:journals/corr/abs-2201-03545,
author = {Zhuang Liu and
Hanzi Mao and
Chao{-}Yuan Wu and
Christoph Feichtenhofer and
Trevor Darrell and
Saining Xie},
title = {A ConvNet for the 2020s},
journal = {CoRR},
volume = {abs/2201.03545},
year = {2022},
url = {https://arxiv.org/abs/2201.03545},
eprinttype = {arXiv},
eprint = {2201.03545},
timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2201-03545.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
đ License
This project is licensed under the Apache - 2.0 license.
Property |
Details |
Model Type |
ConvNeXT (xlarge - sized model) |
Training Data |
ImageNet - 22k |