đ ResNet-152 v1.5
A ResNet model pre - trained on ImageNet - 1k at a resolution of 224x224. It was introduced in a paper by He et al., revolutionizing image recognition with residual learning.
đ Quick Start
ResNet-152 v1.5 is a pre - trained model on ImageNet - 1k. You can use it for image classification tasks right away. Below is an example of using this model to classify an image from the COCO 2017 dataset into one of the 1,000 ImageNet classes:
from transformers import AutoFeatureExtractor, ResNetForImageClassification
import torch
from datasets import load_dataset
dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-152")
model = ResNetForImageClassification.from_pretrained("microsoft/resnet-152")
inputs = feature_extractor(image, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label])
For more code examples, refer to the documentation.
⨠Features
- Residual Learning: ResNet democratized the concepts of residual learning and skip connections, enabling the training of much deeper models.
- Version Improvement: ResNet v1.5 differs from the original model. In the bottleneck blocks requiring downsampling, v1.5 has stride = 2 in the 3x3 convolution instead of the first 1x1 convolution in v1. This makes ResNet50 v1.5 slightly more accurate (~0.5% top1) than v1, though it comes with a small performance drawback (~5% imgs/sec) according to Nvidia.
đ Documentation
Model description
ResNet (Residual Network) is a convolutional neural network that popularized the concepts of residual learning and skip connections, which allows for the training of much deeper models.
This is ResNet v1.5. In the bottleneck blocks that need downsampling, v1 has a stride of 2 in the first 1x1 convolution, while v1.5 has a stride of 2 in the 3x3 convolution. As a result, ResNet50 v1.5 is slightly more accurate (~0.5% top1) than v1, but there is a small performance reduction (~5% imgs/sec) as reported by Nvidia.

Intended uses & limitations
You can use the raw model for image classification. Check the model hub to find fine - tuned versions for tasks that interest you.
BibTeX entry and citation info
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
đ License
This model is released under the Apache 2.0 license.
Property |
Details |
Model Type |
Convolutional Neural Network (ResNet v1.5) |
Training Data |
ImageNet - 1k |
â ī¸ Important Note
The team releasing ResNet did not write a model card for this model, so this model card has been written by the Hugging Face team.