ConvNeXT-base-384-22k-1k Open-Source Image Model - Pretraining and Fine-Tuning, Outperforming Transformers in Performance!

Convnext Base 384 22k 1k

Developed by facebook

ConvNeXT is a pure convolutional model inspired by vision Transformer designs, pretrained on ImageNet-22k and fine-tuned on ImageNet-1k, outperforming Transformers.

Image Classification

Transformers

Open Source License:Apache-2.0 #Pure Convolutional Architecture #High-Precision Image Classification #Modern ConvNet Design

Downloads 797

Release Time : 3/2/2022

Model Overview

ConvNeXT is a modern convolutional neural network designed for image classification tasks, combining the strengths of traditional ConvNets and Transformers.

Model Features

Pure Convolutional Architecture

Adopts a pure convolutional structure, avoiding the computational complexity of Transformers while maintaining high performance.

Modernized Design

Starting from ResNet, it incorporates design concepts from Swin Transformer to achieve architectural modernization.

High Performance

Excels in benchmarks like ImageNet, claiming superior performance over Transformer models.

Model Capabilities

Image Classification

Visual Feature Extraction

Use Cases

Computer Vision

Object Recognition

Identifies object categories in images, such as animals, everyday items, etc.

Accurately classifies 1,000 categories in ImageNet-1k.

Scene Classification

Classifies complex scenes, such as identifying building types or natural environments.

🚀 ConvNeXT (base-sized model)

A ConvNeXT model pre - trained on ImageNet - 22k and fine - tuned on ImageNet - 1k at resolution 384x384. It offers a new approach to image classification by combining convolutional design with modern architecture concepts.

🚀 Quick Start

The ConvNeXT model is a powerful tool for image classification. It was introduced in the paper A ConvNet for the 2020s by Liu et al. and first released in this repository.

Disclaimer: The team releasing ConvNeXT did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

Innovative Design: ConvNeXT is a pure convolutional model (ConvNet) inspired by Vision Transformers, aiming to outperform them.
Modernized Architecture: The authors started from a ResNet and "modernized" its design by taking the Swin Transformer as inspiration.

model image

📚 Documentation

Intended uses & limitations

You can use the raw model for image classification. See the model hub to look for fine - tuned versions on a task that interests you.

💻 Usage Examples

Basic Usage

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import ConvNextImageProcessor, ConvNextForImageClassification
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

processor = ConvNextImageProcessor.from_pretrained("facebook/convnext-base-384-22k-1k")
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-base-384-22k-1k")

inputs = processor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# model predicts one of the 1000 ImageNet classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label]),

For more code examples, we refer to the documentation.

🔧 Technical Details

ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers. The authors started from a ResNet and "modernized" its design by taking the Swin Transformer as inspiration, claiming to outperform Vision Transformers.

📄 License

This model is licensed under the Apache - 2.0 license.

Property	Details
Model Type	ConvNet
Training Data	ImageNet - 22k, ImageNet - 1k

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2201-03545,
  author    = {Zhuang Liu and
               Hanzi Mao and
               Chao{-}Yuan Wu and
               Christoph Feichtenhofer and
               Trevor Darrell and
               Saining Xie},
  title     = {A ConvNet for the 2020s},
  journal   = {CoRR},
  volume    = {abs/2201.03545},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.03545},
  eprinttype = {arXiv},
  eprint    = {2201.03545},
  timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2201-03545.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご