ConvNeXT-large-384-22k-1k Open-Source Model - A Powerful Image Classification Tool on Par with Transformer

Convnext Large 384 22k 1k

Developed by facebook

ConvNeXT is a pure convolutional model inspired by vision Transformer designs, pretrained on ImageNet-22k and fine-tuned on ImageNet-1k, outperforming traditional Transformers.

Image Classification

Transformers

Open Source License:Apache-2.0 #High-precision image classification #Convolutional neural network optimization #ImageNet-22k pretrained

Downloads 73

Release Time : 3/2/2022

Model Overview

ConvNeXT is a modern convolutional neural network designed for image classification tasks, enhancing traditional CNN performance by incorporating Transformer advantages.

Model Features

Modern convolutional design

Starting from ResNet, it incorporates Swin Transformer design concepts to modernize traditional convolutional networks.

High-performance image classification

Excels on ImageNet-22k and ImageNet-1k datasets, outperforming comparable Transformer models.

High-resolution support

Supports 384x384 high-resolution image input to capture finer image features.

Model Capabilities

Image classification

Visual feature extraction

Use Cases

Computer vision

Object recognition

Identify object categories in images, such as animals, everyday items, etc.

Examples successfully recognized objects like tigers and teapots.

Scene classification

Classify image scenes, such as identifying building types or natural landscapes.

Examples successfully recognized scenes like palaces.

🚀 ConvNeXT (large-sized model)

A ConvNeXT model pre-trained on ImageNet-22k and fine-tuned on ImageNet-1k at resolution 384x384, offering high - performance in image classification.

🚀 Quick Start

ConvNeXT is a powerful model for image classification. It was pre - trained on ImageNet - 22k and fine - tuned on ImageNet - 1k at a resolution of 384x384. It was introduced in the paper A ConvNet for the 2020s by Liu et al. and first released in this repository.

Disclaimer: The team releasing ConvNeXT did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

Innovative Design: ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, aiming to outperform them.
High - Performance: Pre - trained on ImageNet - 22k and fine - tuned on ImageNet - 1k, it shows excellent results in image classification tasks.

📚 Documentation

Model description

ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them. The authors started from a ResNet and "modernized" its design by taking the Swin Transformer as inspiration.

model image

Intended uses & limitations

You can use the raw model for image classification. See the model hub to look for fine - tuned versions on a task that interests you.

How to use

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import ConvNextFeatureExtractor, ConvNextForImageClassification
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

feature_extractor = ConvNextFeatureExtractor.from_pretrained("facebook/convnext-large-384-22k-1k")
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-large-384-22k-1k")

inputs = feature_extractor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# model predicts one of the 1000 ImageNet classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label]),

For more code examples, we refer to the documentation.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2201-03545,
  author    = {Zhuang Liu and
               Hanzi Mao and
               Chao{-}Yuan Wu and
               Christoph Feichtenhofer and
               Trevor Darrell and
               Saining Xie},
  title     = {A ConvNet for the 2020s},
  journal   = {CoRR},
  volume    = {abs/2201.03545},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.03545},
  eprinttype = {arXiv},
  eprint    = {2201.03545},
  timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2201-03545.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 License

The model is released under the Apache - 2.0 license.

Property	Details
Model Type	ConvNeXT (large - sized model)
Training Data	ImageNet - 22k, ImageNet - 1k
Tags	vision, image - classification
Widget Examples	Tiger, Teapot, Palace

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご