ConvNeXT-large-224-22k-1k Open-Source Image Model - Pre-training and Fine-tuning for Efficient Image Task Processing

Convnext Large 224 22k 1k

Developed by facebook

ConvNeXT is a pure convolutional model inspired by vision Transformer designs, pre-trained on ImageNet-22k and fine-tuned on ImageNet-1k, outperforming traditional vision Transformers.

Image Classification

Transformers

Open Source License:Apache-2.0 #Pure convolutional architecture #ImageNet classification #High-precision vision model

Downloads 13.71k

Release Time : 3/2/2022

Model Overview

ConvNeXT is a modern convolutional neural network designed for image classification tasks, enhancing traditional ConvNet performance by incorporating Transformer design principles.

Model Features

Modern convolutional design

Starting from ResNet, it incorporates design concepts from Swin Transformer to modernize traditional convolutional networks.

High-performance image classification

Pre-trained on ImageNet-22k and fine-tuned on ImageNet-1k, demonstrating excellent image classification performance.

Pure convolutional architecture

Fully based on convolutional operations, achieving performance comparable to Transformers without using attention mechanisms.

Model Capabilities

Image classification

Visual feature extraction

Use Cases

Computer vision

General image classification

Classify images into one of the 1,000 categories in ImageNet

Highly accurate classification results

Object recognition

Identify specific objects in images, such as animals, everyday items, etc.

Can accurately recognize common objects like tigers, teapots, etc.

🚀 ConvNeXT (large-sized model)

A ConvNeXT model pre-trained on ImageNet-22k and fine-tuned on ImageNet-1k at 224x224 resolution, aiming to provide high-performance image classification.

🚀 Quick Start

ConvNeXT is a pure convolutional model (ConvNet) inspired by Vision Transformers, claiming to outperform them. It was introduced in the paper A ConvNet for the 2020s by Liu et al. and first released in this repository.

Disclaimer: The team releasing ConvNeXT did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Innovative Design: Inspired by Vision Transformers, it modernizes the design of traditional ConvNets.
High Performance: Claims to outperform Vision Transformers in image classification tasks.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import ConvNextImageProcessor, ConvNextForImageClassification
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

processor = ConvNextImageProcessor.from_pretrained("facebook/convnext-large-224-22k-1k")
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-large-224-22k-1k")

inputs = processor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# model predicts one of the 1k ImageNet classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label])

For more code examples, we refer to the documentation.

📚 Documentation

Model description

ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them. The authors started from a ResNet and "modernized" its design by taking the Swin Transformer as inspiration.

model image

Intended uses & limitations

You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2201-03545,
  author    = {Zhuang Liu and
               Hanzi Mao and
               Chao{-}Yuan Wu and
               Christoph Feichtenhofer and
               Trevor Darrell and
               Saining Xie},
  title     = {A ConvNet for the 2020s},
  journal   = {CoRR},
  volume    = {abs/2201.03545},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.03545},
  eprinttype = {arXiv},
  eprint    = {2201.03545},
  timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2201-03545.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

🔧 Technical Details

ConvNeXT starts from a ResNet and modernizes its design by taking the Swin Transformer as inspiration. It is a pure convolutional model that claims to outperform Vision Transformers in image classification tasks.

📄 License

The model is released under the Apache-2.0 license.

Property	Details
Model Type	ConvNeXT (large-sized model)
Training Data	ImageNet-22k for pre-training, ImageNet-1k for fine-tuning
License	Apache-2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご