ddpm_mediterranean_reanalysis_tas Open-source Model - Downscaling Data to Generate High-resolution Regional Reanalysis Data

Ddpm Mediterranean Reanalysis Tas

Developed by jpxkqx

This project downscales ERA5 global reanalysis data to establish a machine learning model capable of generating high-resolution regional reanalysis data

Climate Model

TensorBoard

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Meteorological Data Downscaling #Diffusion Model Applications #Multi-time Step Forecasting

Downloads 19

Release Time : 10/19/2023

Model Overview

Utilizes deep learning techniques such as U-Net, conditional generative adversarial networks, and diffusion models to develop a data assimilation module that evaluates the potential benefits of incorporating CERRA pseudo-observation data as additional predictors

Model Features

Multi-scheduler Support

Supports three noise schedulers: DDPMScheduler, DDIM, and LMSDiscreteScheduler

Climate Data-specific Normalization

Tested various climate data-specific preprocessing schemes, including pixel-level, regional-level, and instance normalization

Rigorous Validation Framework

Strict evaluation through a multi-dimensional validation framework, including traditional error metrics and spatiotemporal correlation analysis

Model Capabilities

Reanalysis data downscaling

High-resolution meteorological data generation

Spatiotemporal pattern reconstruction

Use Cases

Meteorological Forecasting

Regional Meteorological Data Enhancement

Downscales low-resolution global reanalysis data into high-resolution regional data

Current performance does not meet expectations and is still inferior to the basic bicubic interpolation method

Climate Research

Historical Climate Data Reconstruction

Generates high-resolution historical climate datasets

Capable of generating high-resolution details but with significant errors

🚀 Europe Reanalysis Super Resolution

This project aims to create a Machine Learning (ML) model that downscales global reanalysis data from ERA5 to generate high - resolution regional reanalysis data, similar to that produced by CERRA. It uses advanced Deep Learning techniques and implements a validation framework to ensure model effectiveness.

🚀 Quick Start

The project's goal is to develop an ML model that can downscale global reanalysis data from ERA5 to generate high - resolution regional reanalysis data. This will be achieved through state - of - the - art DL techniques. After model design and training, a detailed validation framework is used to evaluate the model's performance. The denoise model is released under the Apache 2.0 license.

✨ Features

Advanced DL Techniques: Utilizes U - Net, conditional GAN, and diffusion models for downscaling.
Comprehensive Validation: Combines classical deterministic error metrics with in - depth validations, disaggregated by months, seasons, and geographical regions.
Interpretability Tools: Allows understanding of DL models' inner workings and decision - making processes.

📚 Documentation

Model Details

Model Description

This model is a Denoise Neural Network trained with instance normalization over bicubic interpolated inputs. It uses a diffusers.UNet2DModel for a Denoising Diffusion Probabilistic Model, with different schedulers such as DDPMScheduler, DDIM, and LMSDiscreteScheduler.

Diagram DDPM

Developed by: A team of Predictia Intelligent Data Solutions S.L.
Model type: Vision model
Language(s) (NLP): en, es
License: Apache - 2.0
Resources for more information: More information needed
- GitHub Repo

Denoise Network

The Denoise network uses the diffusers.UNet2DModel with different model sizes. It takes 2 channels as inputs (noisy image at a timestep (t) and the bicubic upsampled ERA5 field) and the timestep (t) (projected to an embedding and added to the input).

Noise Scheduler

Different schedulers are considered:

Training Data

The dataset is a composition of the ERA5 and CERRA reanalysis. The input grids (ERA5) have a spatial coverage defined as:

      longitude: [-8.35, 6.6]
      latitude: [46.45, 35.50]

The target high - resolution grid (CERRA) has a spatial coverage of:

      longitude: [-6.85, 5.1]
      latitude: [44.95, 37]

The data samples for training are from 1981 - 2013, and from 2014 - 2017 for per - epoch validation.

Normalization techniques

With monthly climatologies:
- Pixel - wise: Compute climatology for each pixel and standardize each pixel with its own statistics.
- Domain - wise: Compute climatology statistics for the whole domain. There are independent and dependent normalizing schemas.
Without past information: Normalize each sample independently by the mean and standard deviation of the ERA5 field. There are two variations: using the statistics of the input ERA5 or the bicubic downscaled ERA5.

Results

The model's results are not considered acceptable as they are not comparable with bicubic interpolation. The best - performing Diffusion Model in the repository is trained with the scheduler specified at scheduler_config.json with the parameters shown in config.json and instance normalization over the downsampled ERA5 inputs.

ddpm

Normalization

Pixel - wise normalization erases the spatial pattern, and the DDPM cannot learn.
Domain - wise scaling can reproduce high - resolution details but fails to match the current high - resolution field.
Using the same domain statistics for input and output represents mean values better but fails to reproduce variance.
Instance normalization on the Denoise Network inputs reproduces the spatial pattern slightly better, and error metrics are more homogeneous spatially.

Schedulers

There is no significant difference in training time or sampling quality at maximum capabilities. DDIM or LMSDiscrete may have higher - quality samples with fewer inference steps during influence, resulting in lower computational cost.

Model sizes

Model size is strongly related to training time. With limited computational resources, there is an improvement when going from tens of output channels to a few hundred, but reaching the default size is not possible due to training failures.

Next Steps

Train a VAE for Latent DM to reduce sample size and computational cost.
Train a larger denoise network, which may require a larger VM and more samples.
Explore other DM flavours like Score Based DM.
Try new architectures available in diffusers.

Compute Infrastructure

The use of GPUs in this deep - learning project significantly accelerates model training and inference. The project benefits from the support of partners:

AI4EOSC: Focuses on integrating and applying AI technologies in the context of open science within the European Open Science Cloud.
European Weather Cloud: A cloud - based collaboration platform for meteorological application development and operations in Europe.

📄 License

This work is released under the Apache 2.0 license. It is funded by the Code for Earth 2023 initiative.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご