tcd-segformer-mit-b2 Open-Source Semantic Segmentation Model - Accurately Depict Tree Coverage in High-Resolution Aerial Images

Tcd Segformer Mit B2

Developed by restor

This is a semantic segmentation model capable of accurately delineating tree cover in high-resolution aerial images.

Open Source License:CC #Aerial Canopy Segmentation #High-Resolution Semantic Segmentation #Specialized for Ecological Monitoring

Downloads 76

Release Time : 5/20/2024

Model Overview

The model is based on the SegFormer architecture and is used to assess tree canopy cover percentage from aerial images, providing pixel-level tree/non-tree classification.

Model Features

High-Resolution Processing Capability

The model is trained on high-resolution aerial images (10 cm/pixel) and can accurately identify canopy cover.

Globally Diverse Training

Trained using globally diverse aerial images, adaptable to different ecological community scenarios.

Practical Prediction Framework

Provides an end-to-end prediction pipeline, supporting tiled processing and prediction stitching for large orthophotos.

Model Capabilities

Aerial Image Analysis

Canopy Cover Detection

Semantic Segmentation

Geospatial Analysis

Use Cases

Ecological Research

Canopy Cover Assessment

Assess the percentage of canopy cover in a study area

Provides precise coverage ratio data

Land Management

Vegetation Monitoring

Monitor vegetation changes in specific areas

Allows tracking of time-series changes

🚀 Model Card for Restor's SegFormer-based TCD models

This is a semantic segmentation model designed to delineate tree cover in high - resolution (10 cm/px) aerial images. It offers a valuable solution for accurately identifying tree cover in such imagery, which is crucial for ecological and environmental studies.

🚀 Quick Start

You can see a brief example of inference in this Colab notebook.

For end - to - end usage, we direct users to our prediction and training pipeline which also supports tiled prediction over arbitrarily large images, reporting outputs, etc.

✨ Features

This semantic segmentation model can accurately delineate tree cover in high - resolution aerial images.
It provides a per - pixel classification of tree/no - tree.
The model is trained on global aerial imagery, enabling it to work on similar images.

📦 Installation

The provided README does not contain specific installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

To load the preprocessor for this model, you can use the following code:

from transformers import AutoImageProcessor
processor = AutoImageProcessor.from_pretrained('restor/tcd-segformer-mit-b2')

Advanced Usage

A typical training command using the pipeline for this model:

tcd-train semantic segformer-mit-b2 data.output= ... data.root=/mnt/data/tcd/dataset/holdout data.tile_size=1024

📚 Documentation

Model Details

Model Description

This semantic segmentation model was trained on global aerial imagery and can accurately delineate tree cover in similar images. It doesn't detect individual trees but provides per - pixel classification of tree/no - tree.

Developed by: Restor / ETH Zurich
Funded by: This project was made possible via a (Google.org impact grant)[https://blog.google/outreach-initiatives/sustainability/restor-helps-anyone-be-part-ecological-restoration/]
Model type: Semantic segmentation (binary class)
License: Model training code is provided under an Apache - 2 license. NVIDIA has released SegFormer under their own research license. Users should check the terms of this license before deploying. This model was trained on CC BY - NC imagery.
Finetuned from model: SegFormer family

SegFormer is a variant of the Pyramid Vision Transformer v2 model, with many identical structural features and a semantic segmentation decode head. Functionally, the architecture is quite similar to a Feature Pyramid Network (FPN) as the output predictions are based on combining features from different stages of the network at different spatial resolutions.

Model Sources

Repository: https://github.com/restor-foundation/tcd
Paper: We will release a preprint shortly.

Uses

Direct Use

This model is suitable for inference on a single image tile. For performing predictions on large orthomosaics, a higher - level framework is required to manage tiling source imagery and stitching predictions. The provided repository offers a comprehensive reference implementation of such a pipeline and has been tested on extremely large images (country - scale).

The model gives predictions for an entire image. In most cases, users will want to predict cover for a specific region of the image, for example a study plot or some other geographic boundary. The linked pipeline repository supports shapefile - based region analysis.

Out - of - Scope Use

While the model was trained on globally diverse imagery, some ecological biomes are under - represented in the training dataset, and performance may vary. Users are encouraged to experiment with their own imagery before using the model for mission - critical use.
The model was trained on imagery at a resolution of 10 cm/px. Predictions at other resolutions may not be reliable. If routine prediction at a different resolution is needed, the model should be fine - tuned on a resampled version of the training dataset.
The model only predicts the likelihood that a pixel is covered by tree canopy and does not predict biomass, canopy height or other derived information.
As - is, the model is not suitable for carbon credit estimation.

Bias, Risks, and Limitations

The main limitation of this model is false positives over objects that look like, or could be confused as, trees, such as large bushes, shrubs or ground cover that looks like tree canopy.
The dataset used to train this model was annotated by non - experts. There are likely incorrect labels in the dataset, which may lead to incorrect predictions or other biases in model output. The development team is working to re - evaluate all training data to remove spurious labels.

The provided cross - validation results and results on independent imagery allow users to make their own assessments. No guarantees on accuracy are provided, and users should perform their own independent testing for mission - critical or production use.

Training Details

Training Data

The training dataset can be found here, where more details about the collection and annotation procedure are available. The image labels are largely released under a CC - BY 4.0 license, with smaller subsets of CC BY - NC and CC BY - SA imagery.

Training Procedure

A 5 - fold cross - validation process was used to adjust hyperparameters during training. After that, the model was trained on the "full" training set and evaluated on a holdout set of images. The model in the main branch of the repository is the release version.

Pytorch Lightning was used as the training framework with the following hyperparameters:

Image size: 1024 px square
Learning rate: initially 1e4 - 1e5
Learning rate schedule: reduce on plateau
Optimizer: AdamW
Augmentation: random crop to 1024x1024, arbitrary rotation, flips, colour adjustments
Number of epochs: 75 during cross - validation to ensure convergence; 50 for final models
Normalisation: Imagenet statistics

A pre - processor configuration in the repository can be used with the model when using the transformers library.

Speeds, Sizes, Times

The model can be evaluated on a CPU (even up to mit - b5), but a large amount of available RAM is needed for large tile sizes. In general, 1024 px inputs are recommended.

All models were trained on a single GPU with 24 GB VRAM (NVIDIA RTX3090) attached to a 32 - core machine with 64GB RAM. Smaller models can be trained in less than half a day, while larger models take just over a day.

Evaluation

Testing Data

The training dataset can be found here. The main branch model was trained on all train images and tested on the test (holdout) images.

Training loss

Metrics

F1, Accuracy and IoU are reported on the holdout dataset, as well as results on a 5 - fold cross - validation split. Cross - validation is visualised as min/max error bars on the plots.

Results

Validation loss IoU Accuracy (foreground) F1 Score

Environmental Impact

This estimate is the maximum (in terms of training time) for the SegFormer family of models presented here. Smaller models, such as mit - b0, train in less than half a day.

Hardware Type: NVIDIA RTX3090
Hours used: < 36
Carbon Emitted: 5.44 kg CO2 equivalent per model

Carbon emissions were estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). This estimate does not account for experimentation time, failed training runs, etc.

Efficient inference on CPU is possible for field work, at the expense of inference latency. A typical single - battery drone flight can be processed in minutes.

Citation

We will provide a preprint version of our paper shortly. In the meantime, please cite as:

BibTeX:

@unpublished{restortcd,
  author = "Veitch - Michaelis, Josh and Cottam, Andrew and Schweizer, Daniella Schweizer and Broadbent, Eben N. and Dao, David and Zhang, Ce and Almeyda Zambrano, Angelica and Max, Simeon",
  title  = "OAM - TCD: A globally diverse dataset of high - resolution tree cover maps",
  note   = "In prep.",
  month  = "06",
  year   = "2024"
}

Model Card Authors

Josh Veitch - Michaelis, 2024; on behalf of the dataset authors.

Model Card Contact

Please contact josh [at] restor.eco for questions or further information.

📄 License

Model training code is provided under an Apache - 2 license. NVIDIA has released SegFormer under their own research license. This model was trained on CC BY - NC imagery.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご