TCD-SegFormer-MIT-B5 Open Source Model - Accurately Depict Tree Coverage in High-Resolution Aerial Images

Tcd Segformer Mit B5

Developed by restor

This is a semantic segmentation model capable of delineating tree cover in high-resolution (10 cm/pixel) aerial images

Open Source License:CC #Aerial Tree Cover Detection #High-Resolution Semantic Segmentation #Specialized for Ecological Monitoring

Downloads 248

Release Time : 5/20/2024

Model Overview

This semantic segmentation model is trained on global aerial imagery and can accurately delineate tree cover in similar images. The model does not detect individual trees but provides pixel-level tree/non-tree classification.

Model Features

High-Resolution Processing Capability

Optimized specifically for high-resolution aerial images at 10 cm/pixel

Global Adaptability

Trained on aerial imagery from diverse global ecological regions, ensuring broad applicability

Pixel-Level Classification

Provides precise pixel-level tree/non-tree classification rather than individual tree detection

Efficient Inference

Capable of efficient inference on CPU, suitable for field operations

Model Capabilities

Aerial Image Analysis

Tree Cover Detection

Semantic Segmentation

Ecological Monitoring

Use Cases

Ecological Monitoring

Forest Cover Assessment

Assess forest cover in specific areas

Provides accurate tree cover area calculations

Ecological Restoration Monitoring

Monitor vegetation recovery progress in ecological restoration projects

Quantifies changes in tree cover

Urban Planning

Urban Greenery Assessment

Evaluate green coverage in urban areas

Provides precise urban tree distribution maps

🚀 Model Card for Restor's SegFormer-based TCD models

This is a semantic segmentation model that can delineate tree cover in high resolution (10 cm/px) aerial images.

🚀 Quick Start

You can see a brief example of inference in this Colab notebook. For end-to-end usage, we direct users to our prediction and training pipeline which also supports tiled prediction over arbitrarily large images, reporting outputs, etc.

✨ Features

This is a semantic segmentation model capable of delineating tree cover in high-resolution (10 cm/px) aerial images.
It provides per - pixel classification of tree/no - tree, trained on global aerial imagery.
The model can be used to assess canopy cover from aerial images.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

The primary use - case for this model is assessing canopy cover from aerial images. You can use the provided pipeline for end - to - end usage. For a brief inference example, refer to this Colab notebook.

Advanced Usage

For performing predictions on large orthomosaics, a higher - level framework is required to manage tiling source imagery and stitching predictions. Our repository provides a comprehensive reference implementation of such a pipeline and has been tested on extremely large images (country - scale).

📚 Documentation

Model Details

Model Description

This semantic segmentation model was trained on global aerial imagery and can accurately delineate tree cover in similar images. It provides per - pixel classification of tree/no - tree.

Developed by: Restor / ETH Zurich
Funded by: This project was made possible via a (Google.org impact grant)[https://blog.google/outreach-initiatives/sustainability/restor-helps-anyone-be-part-ecological-restoration/]
Model type: Semantic segmentation (binary class)
License: Model training code is provided under an Apache - 2 license. NVIDIA has released SegFormer under their own research license. Users should check the terms of this license before deploying. This model was trained on CC BY - NC imagery.
Finetuned from model: SegFormer family

SegFormer is a variant of the Pyramid Vision Transformer v2 model, with many identical structural features and a semantic segmentation decode head. Functionally, the architecture is quite similar to a Feature Pyramid Network (FPN) as the output predictions are based on combining features from different stages of the network at different spatial resolutions.

Model Sources

Repository: https://github.com/restor-foundation/tcd
Paper: We will release a preprint shortly.

Uses

Direct Use

This model is suitable for inference on a single image tile. For large orthomosaics, a higher - level framework is needed. The model gives predictions for an entire image, and users may want to perform region - of - interest analysis on the results. Our linked pipeline repository supports shapefile - based region analysis.

Out - of - Scope Use

Some ecological biomes are under - represented in the training dataset, so performance may vary.
The model was trained at a resolution of 10 cm/px. Results at other resolutions may not be reliable.
It does not predict biomass, canopy height or other derived information, only the likelihood of pixel tree - canopy coverage.
As - is, it is not suitable for carbon credit estimation.

Bias, Risks, and Limitations

The main limitation is false positives over objects that look like trees, such as large bushes or shrubs.
The training dataset was annotated by non - experts, so there may be incorrect labels, leading to incorrect predictions or biases.
We provide cross - validation results and results on independent imagery, but no guarantees on accuracy. Users should perform their own testing.

Training Details

Training Data

The training dataset may be found here, where more details about the collection and annotation procedure are available. Our image labels are largely released under a CC - BY 4.0 license, with smaller subsets of CC BY - NC and CC BY - SA imagery.

Training Procedure

We used a 5 - fold cross - validation process to adjust hyperparameters during training, then trained on the "full" training set and evaluated on a holdout set of images. The model in the main branch of this repository is the release version.

We used Pytorch Lightning as our training framework with the following hyperparameters:

Image size: 1024 px square
Learning rate: initially 1e4 - 1e5
Learning rate schedule: reduce on plateau
Optimizer: AdamW
Augmentation: random crop to 1024x1024, arbitrary rotation, flips, colour adjustments
Number of epochs: 75 during cross - validation to ensure convergence; 50 for final models
Normalisation: Imagenet statistics

A typical training command using our pipeline for this model:

tcd-train semantic segformer-mit-b5 data.output= ... data.root=/mnt/data/tcd/dataset/holdout data.tile_size=1024

Preprocessing

This repository contains a pre - processor configuration for use with the transformers library. You can load it as follows:

from transformers import AutoImageProcessor
processor = AutoImageProcessor.from_pretrained('restor/tcd-segformer-mit-b5')

Note that we do not resize input images and assume normalisation is performed in this processing step.

Speeds, Sizes, Times

The model can be evaluated on a CPU, but large tile sizes require a lot of RAM. It's better to perform inference in batched mode at 1024x1024 px. All models were trained on a single GPU with 24 GB VRAM (NVIDIA RTX3090) attached to a 32 - core machine with 64GB RAM. Smaller models train in less than half a day, while the largest take just over a day.

Evaluation

Testing Data

The training dataset is here. The main branch model was trained on all train images and tested on the test (holdout) images.

Metrics

We report F1, Accuracy and IoU on the holdout dataset, as well as results on a 5 - fold cross - validation split. Cross - validation is visualised as min/max error bars on the plots.

Results

Training loss Validation loss IoU Accuracy (foreground) F1 Score

Environmental Impact

Hardware Type: NVIDIA RTX3090
Hours used: < 36
Carbon Emitted: 5.44 kg CO2 equivalent per model

Carbon emissions were estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). This estimate does not account for experimentation time or failed training runs. Efficient CPU inference is possible for field work, with a trade - off in inference latency.

Citation and contact

BibTeX: This paper was accepted into NeurIPS 2024 under the Datasets and Benchmarks track. The citation will be updated once the final version is confirmed and the proceedings are online.

@inproceedings{restortcd,
    author = {Veitch-Michaelis, Josh and Cottam, Andrew and Schweizer, Daniella Schweizer and Broadbent, Eben N. and Dao, David and Zhang, Ce and Almeyda Zambrano, Angelica and Max, Simeon},
    title  = {OAM-TCD: A globally diverse dataset of high-resolution tree cover maps},
    booktitle = {Advances in Neural Information Processing Systems},
    pages = {1--12},
    publisher = {Curran Associates, Inc.},
    volume = {37},
    year = {2024}
}

Please contact josh [at] restor.eco for questions or further information.

📄 License

Model training code is provided under an Apache - 2 license. NVIDIA has released SegFormer under their own research license. This model was trained on CC BY - NC imagery.

Model Card Authors

Josh Veitch - Michaelis, 2024; on behalf of the dataset authors.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご