# Scene Understanding
Multilabel GeoSceneNet
Apache-2.0
A multi-label image classification model fine-tuned based on SigLIP architecture, capable of identifying 7 types of geographic scene elements
Image Classification
Transformers Supports Multiple Languages

M
prithivMLmods
26
3
Depthpro ONNX
DepthPro is a vision model for depth estimation, capable of predicting scene depth information from a single image.
3D Vision
Transformers

D
onnx-community
146
10
Depth Anything V2 Base
Depth-Anything-V2-Base is an ONNX-format depth estimation model adapted for Transformers.js, designed for image depth estimation on the web.
3D Vision
Transformers

D
onnx-community
56
0
Nebula
MIT
This model is an image-to-text model, focusing on generating captions for images.
Image Generation
Transformers

N
SRDdev
17
0
Segformer B0 Person Segmentation
Openrail
A semantic segmentation model based on the Segformer architecture, used to assign semantic category labels to each pixel in an image.
Image Segmentation
Transformers English

S
s3nh
3,187
2
Upernet Swin Large
MIT
UperNet is a framework for semantic segmentation, combining the Swin Transformer backbone to achieve pixel-level scene understanding
Image Segmentation
Transformers English

U
openmmlab
3,251
0
Upernet Swin Base
MIT
UperNet is a framework for semantic segmentation that uses Swin Transformer as the backbone network, enabling efficient pixel-level semantic annotation.
Image Segmentation
Transformers English

U
openmmlab
700
2
Upernet Swin Tiny
MIT
UperNet is a semantic segmentation framework that uses Swin Transformer as the backbone network, enabling pixel-level semantic label prediction.
Image Segmentation
Transformers English

U
openmmlab
4,682
3
Upernet Convnext Xlarge
MIT
UperNet is a framework for semantic segmentation, utilizing ConvNeXt as the backbone network, capable of predicting semantic labels for each pixel.
Image Segmentation
Transformers English

U
openmmlab
659
2
Upernet Convnext Base
MIT
UperNet is a framework for semantic segmentation that uses ConvNeXt as the backbone network and can predict semantic labels for each pixel.
Image Segmentation
Transformers English

U
openmmlab
178
1
Upernet Convnext Tiny
MIT
UperNet is a framework for semantic segmentation that uses ConvNeXt as the backbone network, capable of predicting a semantic label for each pixel.
Image Segmentation
Transformers English

U
openmmlab
3,866
3
Featured Recommended AI Models