ConvNeXT-large-224-22k Open Source Model - A High-Performance Image Recognition Tool Surpassing Transformer

Convnext Large 224 22k

Developed by facebook

ConvNeXT is a pure convolutional model inspired by the design of vision Transformers, trained on the ImageNet-22k dataset, with performance surpassing Transformers.

Image Classification

Transformers

Open Source License:Apache-2.0 #Pure Convolutional Architecture #Image Classification #High-Accuracy Classification

Downloads 1,425

Release Time : 3/2/2022

Model Overview

This model is primarily used for image classification tasks, capable of classifying input images into one of the 22k ImageNet categories.

Model Features

Pure Convolutional Architecture

Adopts a pure convolutional network design, avoiding the computational complexity of Transformers.

Modernization

Based on the ResNet architecture, modernized by incorporating concepts from Swin Transformer.

High Performance

Claims to surpass the performance of vision Transformer models.

Large-Scale Training

Trained on the large-scale ImageNet-22k dataset.

Model Capabilities

Image Classification

Visual Feature Extraction

Use Cases

Computer Vision

General Image Classification

Classifies input images into one of the 22k ImageNet categories.

Object Recognition

Identifies object categories in images (e.g., tiger, teapot, etc.).

🚀 ConvNeXT (large-sized model)

A ConvNeXT model trained on ImageNet-22k at a resolution of 224x224, aiming to revolutionize image classification with its advanced convolutional architecture.

🚀 Quick Start

ConvNeXT is a pure convolutional model (ConvNet) inspired by Vision Transformers, claiming superior performance. It was introduced in the paper A ConvNet for the 2020s by Liu et al. and first released in this repository.

Disclaimer: The team releasing ConvNeXT did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Innovative Design: Inspired by Vision Transformers, it modernizes the ResNet design.
High Performance: Claims to outperform Vision Transformers in image classification tasks.

📦 Installation

No specific installation steps are provided in the original README. So, this section is skipped.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import ConvNextFeatureExtractor, ConvNextForImageClassification
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

feature_extractor = ConvNextFeatureExtractor.from_pretrained("facebook/convnext-large-224-22k")
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-large-224-22k")

inputs = feature_extractor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# model predicts one of the 22k ImageNet classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label]),

For more code examples, we refer to the documentation.

📚 Documentation

Model description

ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them. The authors started from a ResNet and "modernized" its design by taking the Swin Transformer as inspiration.

model image

Intended uses & limitations

You can use the raw model for image classification. See the model hub to look for fine - tuned versions on a task that interests you.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2201-03545,
  author    = {Zhuang Liu and
               Hanzi Mao and
               Chao{-}Yuan Wu and
               Christoph Feichtenhofer and
               Trevor Darrell and
               Saining Xie},
  title     = {A ConvNet for the 2020s},
  journal   = {CoRR},
  volume    = {abs/2201.03545},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.03545},
  eprinttype = {arXiv},
  eprint    = {2201.03545},
  timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2201-03545.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 License

The model is licensed under the Apache-2.0 license.

Property	Details
Model Type	ConvNeXT (large-sized model)
Training Data	ImageNet-22k
Tags	vision, image-classification

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご