ConvNeXT-XLarge-224-22K Open-Source Image Model - A High-Performance Alternative to Vision Transformers

Convnext Xlarge 224 22k

Developed by facebook

ConvNeXT is a pure convolutional model inspired by Vision Transformers, claiming superior performance over Vision Transformers. This model was trained on the ImageNet-22k dataset at 224x224 resolution.

Image Classification

Transformers

Open Source License:Apache-2.0 #Pure convolutional architecture #ImageNet-22k pretrained #224x224 resolution

Downloads 2,135

Release Time : 3/2/2022

Model Overview

ConvNeXT is a modern pure convolutional neural network primarily used for image classification tasks. It builds upon ResNet and incorporates improvements inspired by Swin Transformer.

Model Features

Modern convolutional network design

Based on ResNet with modern improvements inspired by Swin Transformer

High performance

Claims superior performance over Vision Transformers

Large-scale pretraining

Trained on the ImageNet-22k dataset

Model Capabilities

Image classification

Visual feature extraction

Use Cases

Computer vision

General image classification

Classify images into 22k categories of ImageNet

🚀 ConvNeXT (xlarge-sized model)

A ConvNeXT model trained on ImageNet-22k at a resolution of 224x224, aiming to achieve high - performance in image classification.

🚀 Quick Start

The ConvNeXT model is a powerful tool for image classification tasks. You can use the raw model for basic image classification, or explore fine - tuned versions on the model hub according to your specific needs.

✨ Features

Innovative Design: Inspired by Vision Transformers, ConvNeXT modernizes the ResNet design, aiming to outperform traditional convolutional models.
High - Resolution Training: Trained on ImageNet - 22k at a resolution of 224x224, enabling it to handle complex image classification tasks.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import ConvNextFeatureExtractor, ConvNextForImageClassification
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

feature_extractor = ConvNextFeatureExtractor.from_pretrained("facebook/convnext-xlarge-224-22k")
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-xlarge-224-22k")

inputs = feature_extractor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# model predicts one of the 22k ImageNet classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label]),

For more code examples, we refer to the documentation.

📚 Documentation

Model description

ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them. The authors started from a ResNet and "modernized" its design by taking the Swin Transformer as inspiration.

model image

Intended uses & limitations

You can use the raw model for image classification. See the model hub to look for fine - tuned versions on a task that interests you.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2201-03545,
  author    = {Zhuang Liu and
               Hanzi Mao and
               Chao{-}Yuan Wu and
               Christoph Feichtenhofer and
               Trevor Darrell and
               Saining Xie},
  title     = {A ConvNet for the 2020s},
  journal   = {CoRR},
  volume    = {abs/2201.03545},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.03545},
  eprinttype = {arXiv},
  eprint    = {2201.03545},
  timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2201-03545.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 License

This project is licensed under the Apache - 2.0 license.

Property	Details
Model Type	ConvNeXT (xlarge - sized model)
Training Data	ImageNet - 22k

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご