Segformer B0 Scene Parse 150
S
Segformer B0 Scene Parse 150
Developed by univers1123
Lightweight image segmentation model based on MIT-B0 architecture, optimized for scene parsing tasks
Downloads 20
Release Time : 8/7/2023
Model Overview
This model is a scene parsing model fine-tuned on the Scene Parse 150 dataset using NVIDIA's MIT-B0 architecture, capable of pixel-level segmentation and recognition of 150 object categories in complex scenes
Model Features
Lightweight Architecture
Adopts MIT-B0 lightweight backbone network to balance performance and computational efficiency
Multi-category Recognition
Supports precise segmentation and recognition of 150 scene object categories
End-to-End Training
Optimized complete training pipeline, directly applicable to real-world scene parsing tasks
Model Capabilities
Image Semantic Segmentation
Scene Understanding
Pixel-level Classification
Multi-object Recognition
Use Cases
Smart Cities
Street View Analysis
Automatically identifies various elements in street scenes (buildings, roads, vehicles, etc.)
Autonomous Driving
Environmental Perception
Real-time parsing of road scenes to assist vehicle decision systems
đ segformer-b0-scene-parse-150
This model is a fine - tuned version of [nvidia/mit - b0](https://huggingface.co/nvidia/mit - b0) on the scene_parse_150 dataset. It is designed for vision - related tasks, specifically image segmentation, and can achieve certain evaluation results in this field.
đ Quick Start
This model is a fine - tuned version of [nvidia/mit - b0](https://huggingface.co/nvidia/mit - b0) on the scene_parse_150 dataset. It achieves the following results on the evaluation set:
- Loss: 2.3431
- Mean Iou: 0.0959
- Mean Accuracy: 0.1537
- Overall Accuracy: 0.5496
- Per Category Iou: [0.44824978876617866, 0.7548671615728508, 0.7119201505944329, 0.5304481563680256, 0.5684691275095736, 0.33051502835188457, 0.6982393617021276, 0.0, 0.3703529914609331, 0.6659141206351092, 0.028823893043720683, 0.17181416221210322, 0.052153820762502065, 0.0, 0.0, 0.0005543923800536699, 0.40565901784724534, 0.05230759173712194, 0.0, 0.07225859019823891, 0.29980315155352005, nan, 0.003601361102652032, 0.0, 0.0, nan, 0.0, 0.0, 0.38898705304076847, 0.05940808241958817, 0.0, nan, 0.0, nan, 0.0, nan, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, nan, 0.0, 0.0, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, 0.0, nan, nan, nan, 0.0, nan, 0.0, 0.0, nan, 0.0, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, nan, nan, nan, 0.0, 0.0, nan, nan, 0.0, 0.0, nan, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, 0.0, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, nan, nan, 0.0, 0.0, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, nan, nan, 0.0, nan, nan, 0.0, 0.0, nan, nan, nan, nan, 0.0, nan, 0.0]
- Per Category Accuracy: [0.8427949438202247, 0.9402615186644498, 0.7846678763016725, 0.7286579984703183, 0.8303175022736334, 0.469325820621132, 0.9020126572710594, nan, 0.5974398752913491, 0.9683369330453564, 0.05725843345934362, 0.24220857754209693, 0.12377594986290638, 0.0, 0.0, 0.0005611873291065182, 0.9580213623749935, 0.08566177782535773, 0.0, 0.16335928996064641, 0.43531591571750716, nan, 0.0036190907034607555, 0.0, 0.0, nan, nan, 0.0, 0.45750991876062724, 0.24276243093922653, 0.0, nan, 0.0, nan, 0.0, nan, 0.0, nan, nan, 0.0, 0.0, 0.0, nan, 0.0, nan, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, 0.0, nan, nan, nan, 0.0, nan, 0.0, 0.0, nan, 0.0, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, nan, nan, nan, 0.0, 0.0, nan, nan, 0.0, 0.0, nan, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, 0.0, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, nan, nan, 0.0, 0.0, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, nan, nan, 0.0, nan, nan, 0.0, 0.0, nan, nan, nan, nan, 0.0, nan, 0.0]
đ Documentation
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
đ§ Technical Details
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 6e - 05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- num_epochs: 50
Training results
Training Loss | Epoch | Step | Validation Loss | Mean Iou | Mean Accuracy | Overall Accuracy | Per Category Iou | Per Category Accuracy |
---|---|---|---|---|---|---|---|---|
4.9918 | 0.5 | 20 | 4.8969 | 0.0108 | 0.0487 | 0.1875 | [0.18900717264720193, 0.17829851112253592, 0.40144144917749963, 0.1885612981412077, 0.11895876927062042, 0.09866217819019046, 0.0057814729592400894, 0.0, 0.0, 0.0, 0.009622579129617706, 0.022129523898301137, 0.0037298450062015365, 0.0, 0.0, 0.0, 0.06277911646586345, 0.0, 0.0, 0.003906402593851322, 0.012887091043266734, nan, 0.0019786836291242806, 0.0, 0.0, 0.0, 0.0, 0.0, 0.015807537456273512, 0.016491354532320934, 0.0, 0.0, 0.0, nan, 0.001438298321545445, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.025794247180438844, nan, 0.0, 0.0, nan, nan, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, nan, 0.0, nan, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, nan, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, nan, 0.0, 0.0, nan, 0.0, 0.0, 0.0, nan, 0.0, 0.0, 0.0, nan, 0.0, 0.0, 0.0, 0.0, 0.0, 0.012904182735093445, 0.0, nan, 0.0, 0.0, 0.0, 0.0, 0.0, nan, 0.0, 0.0, 0.0, 0.0, nan, 0.0, 0.0, 0.0, nan, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, nan, 0.0, 0.0, nan, nan, 0.0, 0.0, 0.0, nan, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, nan, nan, 0.0, 0.0, 0.0, nan, 0.0] | [0.2067212858926342, 0.2463388525747603, 0.8113718838750394, 0.515406462102938, 0.1316758582686337, 0.11907251217424253, 0.007544887960475232, nan, 0.0, 0.0, 0.013315354795213214, 0.22085775420969392, 0.054576315445880666, 0.0, 0.0, 0.0, 0.07673176606105031, 0.0, 0.0, 0.004186552792430713, 0.013544374703761687, nan, 0.0021687933259709673, 0.0, 0.0, nan, nan, 0.0, 0.01809937653504629, 0.30082872928176796, 0.0, nan, 0.0, nan, 0.0019430975470621792, nan, 0.0, nan, nan, 0.0, 0.0, 0.0, nan, 0.3248302818350134, nan, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, 0.0, nan, nan, nan, 0.0, nan, 0.0, 0.0, nan, 0.0, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, nan, nan, nan, 0.0, 0.0, nan, nan, 0.0, 0.0, nan, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, 0.0, nan, nan, nan, 0.05007914807886027, nan, nan, nan, 0.0, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, nan, nan, 0.0, 0.0, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, nan, nan, 0.0, nan, nan, 0.0, 0.0, nan, nan, nan, nan, 0.0, nan, 0.0] |
4.551 | 1.0 | 40 | 4.5955 | 0.0202 | 0.0640 | 0.3442 | [0.3519414971273263, 0.3937735618735424, 0.42161939446421154, 0.21975617697678057, 0.3809140886893701, 0.09030492572322127, 0.005777833411293457, 0.0, 0.0, 0.0, 0.0, 0.02885598249784122, 0.0, 0.0, 0.0, 0.0, 0.0573680633208358, 0.0, 0.0011308737583006134, 0.006298751950078003, 0.057306667023884476, nan, 0.0014234124996705063, 0.0, 0.0, 0.0, nan, 0.0, 0.017088433502956954, 0.023128390596745027, 0.0, 0.0, 0.0, nan, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0052120890103356235, 0.0, 0.0, nan, nan, nan, 0.0, nan, 0.0, 0.0, 0.0, 0.0, nan, nan, 0.0, 0.0, 0.0, 0.0, nan, 0.0, nan, 0.0, nan, 0.0, 0.0, nan, 0.0, 0.0, 0.0, nan, nan, nan, 0.0, 0.0, nan, nan, nan, nan, 0.0, 0.0, 0.0, nan, nan, 0.0, 0.0, nan, 0.0, 0.0, 0.0, nan, 0.0, nan, 0.0, nan, 0.0, 0.0, 0.0, nan, 0.0, 0.0, nan, nan, 0.0, 0.0, nan, 0.0, nan, nan, nan, 0.0, 0.0, nan, nan, 0.0, 0.0, 0.0, nan, 0.0, 0.0, nan, 0.0, nan, 0.0, 0.0, 0.0, nan, 0.0, 0.0, nan, nan, 0.0, 0.0, 0.0, nan, 0.0, 0.0, nan, nan, 0.0, 0.0, nan, nan, 0.0, nan, 0.0, nan, 0.0] | [0.5003401997503121, 0.785197975265903, 0.952571789207952, 0.4632600048849975, 0.5973921536125012, 0.09887449654069073, 0.006026290292656446, nan, 0.0, 0.0, 0.0, 0.4399013861754582, 0.0, 0.0, 0.0, 0.0, 0.06849835069898948, 0.0, 0.0098046905639658, 0.0067612827597756005, 0.06144521207072375, nan, 0.0014369918969623589, 0.0, 0.0, nan, nan, 0.0, 0.018779520120914415, 0.07066298342541437, 0.0, nan, 0.0, nan, 0.0, nan, 0.0, nan, nan, 0.0, 0.0, 0.0, nan, 0.01695124459987657, nan, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, 0.0, nan, nan, nan, 0.0, nan, 0.0, 0.0, nan, 0.0, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, nan, nan, nan, 0.0, 0.0, nan, nan, 0.0, 0.0, nan, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, 0.0, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, nan, nan, 0.0, 0.0, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, nan, nan, 0.0, nan, nan, 0.0, 0.0, nan, nan, nan, nan, 0.0, nan, 0.0] |
4.3293 | 1.5 | 60 | 4.2491 | 0.0352 | 0.0776 | 0.4184 | [0.3871309742960387, 0.45006666311952553, 0.6112315905191344, 0.3032571607536305, 0.44533070206501846, 0.063346836376098, 0.022528980712202534, 0.0, 0.0, 0.0, 0.0, 0.029790687595074403, 0.0019532612486920127, 0.0, 0.0, 0.0, 0.24116048081196448, 0.0, 0.0, 0.00036401133344849045, 0.00036401133344849045, ...] | [0.5003401997503121, 0.785197975265903, 0.952571789207952, 0.4632600048849975, 0.5973921536125012, 0.09887449654069073, 0.006026290292656446, nan, 0.0, 0.0, 0.0, 0.4399013861754582, 0.0, 0.0, 0.0, 0.0, 0.06849835069898948, 0.0, 0.0098046905639658, 0.0067612827597756005, 0.06144521207072375, nan, 0.0014369918969623589, 0.0, 0.0, nan, nan, 0.0, 0.018779520120914415, 0.07066298342541437, 0.0, nan, 0.0, nan, 0.0, nan, 0.0, nan, nan, 0.0, 0.0, 0.0, nan, 0.01695124459987657, nan, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, 0.0, nan, nan, nan, 0.0, nan, 0.0, 0.0, nan, 0.0, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, nan, nan, nan, 0.0, 0.0, nan, nan, 0.0, 0.0, nan, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, 0.0, nan, nan, nan, 0.05007914807886027, nan, nan, nan, 0.0, nan, nan, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, nan, 0.0, nan, nan, nan, nan, 0.0, 0.0, 0.0, nan, nan, nan, nan, nan, nan, nan, 0.0, nan, nan, 0.0, nan, nan, 0.0, 0.0, nan, nan, nan, nan, 0.0, nan, 0.0] |
đ License
This model is released under the other
license.
đĻ Information
Property | Details |
---|---|
Model Type | Fine - tuned version of [nvidia/mit - b0](https://huggingface.co/nvidia/mit - b0) |
Training Data | scene_parse_150 |
Tags | vision, image - segmentation, generated_from_trainer |
Clipseg Rd64 Refined
Apache-2.0
CLIPSeg is an image segmentation model based on text and image prompts, supporting zero-shot and one-shot image segmentation tasks.
Image Segmentation
Transformers

C
CIDAS
10.0M
122
RMBG 1.4
Other
BRIA RMBG v1.4 is an advanced background removal model designed for efficiently separating foreground and background in various types of images, suitable for non-commercial use.
Image Segmentation
Transformers

R
briaai
874.12k
1,771
RMBG 2.0
Other
The latest background removal model developed by BRIA AI, capable of effectively separating foreground and background in various images, suitable for large-scale commercial content creation scenarios.
Image Segmentation
Transformers

R
briaai
703.33k
741
Segformer B2 Clothes
MIT
SegFormer model fine-tuned on ATR dataset for clothing and human segmentation
Image Segmentation
Transformers

S
mattmdjaga
666.39k
410
Sam Vit Base
Apache-2.0
SAM is a vision model capable of generating high-quality object masks from input prompts (such as points or boxes), supporting zero-shot segmentation tasks
Image Segmentation
Transformers Other

S
facebook
635.09k
137
Birefnet
MIT
BiRefNet is a deep learning model for high-resolution binary image segmentation, which achieves accurate image segmentation through a bilateral reference network.
Image Segmentation
Transformers

B
ZhengPeng7
626.54k
365
Segformer B1 Finetuned Ade 512 512
Other
SegFormer is a Transformer-based semantic segmentation model fine-tuned on the ADE20K dataset, suitable for image segmentation tasks.
Image Segmentation
Transformers

S
nvidia
560.79k
6
Sam Vit Large
Apache-2.0
SAM is a visual model capable of generating high-quality object masks from input points or bounding boxes, with zero-shot transfer capability.
Image Segmentation
Transformers Other

S
facebook
455.43k
28
Face Parsing
Semantic segmentation model fine-tuned from nvidia/mit-b5 for face parsing tasks
Image Segmentation
Transformers English

F
jonathandinu
398.59k
157
Sam Vit Huge
Apache-2.0
SAM is a vision model capable of generating high-quality object masks based on input prompts, supporting zero-shot transfer to new tasks
Image Segmentation
Transformers Other

S
facebook
324.78k
163
Featured Recommended AI Models
Š 2025AIbase