Model Selection

Scene Understanding

# Scene Understanding

Multilabel GeoSceneNet

A multi-label image classification model fine-tuned based on SigLIP architecture, capable of identifying 7 types of geographic scene elements

Image Classification

Transformers Supports Multiple Languages

DepthPro is a vision model for depth estimation, capable of predicting scene depth information from a single image.

Depth Anything V2 Base

Depth-Anything-V2-Base is an ONNX-format depth estimation model adapted for Transformers.js, designed for image depth estimation on the web.

This model is an image-to-text model, focusing on generating captions for images.

Image Generation

Segformer B0 Person Segmentation

A semantic segmentation model based on the Segformer architecture, used to assign semantic category labels to each pixel in an image.

Image Segmentation

Transformers English

Upernet Swin Large

UperNet is a framework for semantic segmentation, combining the Swin Transformer backbone to achieve pixel-level scene understanding

Image Segmentation

Transformers English

Upernet Swin Base

UperNet is a framework for semantic segmentation that uses Swin Transformer as the backbone network, enabling efficient pixel-level semantic annotation.

Image Segmentation

Transformers English

Upernet Swin Tiny

UperNet is a semantic segmentation framework that uses Swin Transformer as the backbone network, enabling pixel-level semantic label prediction.

Image Segmentation

Transformers English

Upernet Convnext Xlarge

UperNet is a framework for semantic segmentation, utilizing ConvNeXt as the backbone network, capable of predicting semantic labels for each pixel.

Image Segmentation

Transformers English

Upernet Convnext Base

UperNet is a framework for semantic segmentation that uses ConvNeXt as the backbone network and can predict semantic labels for each pixel.

Image Segmentation

Transformers English

Upernet Convnext Tiny

UperNet is a framework for semantic segmentation that uses ConvNeXt as the backbone network, capable of predicting a semantic label for each pixel.

Image Segmentation

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase