Granite-Timeseries-PatchTST Open-Source Time Series Forecasting Model

Granite Timeseries Patchtst

Developed by ibm-granite

PatchTST is a Transformer-based time series forecasting model designed for long-term time series forecasting, utilizing subsequence patching and channel independence techniques to improve prediction accuracy.

Climate Model

Transformers

Open Source License:Apache-2.0 #Power forecasting #Long-term sequence modeling #Channel independence

Downloads 1,505

Release Time : 1/19/2024

Model Overview

This model is used for time series forecasting tasks, particularly suited for predicting seven channels in the power transformer dataset ETTh1. The model forecasts values for the next 96 hours based on the previous 512 hours of historical data.

Model Features

Subsequence patching technique

Divides time series into fixed-size subsequence patches as Transformer inputs, preserving local semantic information while reducing computational costs.

Channel independence

Each channel is processed as a univariate time series, sharing the same embedding and Transformer weights, enabling the model to focus on longer historical data.

Modular design

Supports masked time series pre-training as well as direct time series forecasting, classification, and regression tasks.

Model Capabilities

Time series forecasting

Long-term time series modeling

Multi-channel time series processing

Use Cases

Power systems

Power transformer load forecasting

Forecasts power transformer load for the next 96 hours

Achieves an MSE of 0.3881 on the ETTh1 test set

🚀 PatchTST model pre-trained on ETTh1 dataset

PatchTST is a transformer-based model for time series modeling tasks, including forecasting, regression, and classification. This pre-trained model on the ETTh1 dataset offers high - quality performance for time series forecasting.

🚀 Quick Start

To start training and evaluating a PatchTST model, you can refer to this demo notebook.

✨ Features

Transformer - based: Ideal for time series tasks like forecasting, regression, and classification.
Pre - trained on ETTh1: Covers all seven channels of the ETTh1 dataset.
High - performance: Achieves a Mean Squared Error (MSE) of 0.3881 on the test split of the ETTh1 dataset when forecasting 96 hours into the future with a 512 - hour historical data window.

📚 Documentation

Model Details

Model Description

The PatchTST model was proposed in A Time Series is Worth 64 Words: Long - term Forecasting with Transformers by Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam.

At a high level, the model vectorizes time series into patches of a given size and encodes the resulting sequence of vectors via a Transformer. Then, it outputs the prediction length forecast via an appropriate head.

The model is based on two key components: (i) Segmentation of time series into subseries - level patches, which serve as input tokens to the Transformer. (ii) Channel - independence, where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series.

The patching design has three - fold benefits: local semantic information is retained in the embedding; the computation and memory usage of the attention maps are quadratically reduced given the same look - back window; and the model can attend to longer history. The channel - independent patch time series Transformer (PatchTST) can significantly improve long - term forecasting accuracy compared with SOTA Transformer - based models.

In addition, PatchTST has a modular design to seamlessly support masked time series pre - training as well as direct time series forecasting, classification, and regression.

Architecture

Model Sources

Uses

This pre - trained model can be used for fine - tuning or evaluation on any Electrical Transformer dataset with the same channels as the ETTh1 dataset, specifically: HUFL, HULL, MUFL, MULL, LUFL, LULL, OT. The model predicts the next 96 hours based on 512 hours of preceding input values. It is crucial to normalize the data. For more details on data pre - processing, please refer to the paper or the demo.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

🔧 Technical Details

Training Details

Training Data

ETTh1/train split. Train/validation/test splits are shown in the demo.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
num_epochs: 10

Training Results

Training Loss	Epoch	Step	Validation Loss
0.4306	1.0	1005	0.7268
0.3641	2.0	2010	0.7456
0.348	3.0	3015	0.7161
0.3379	4.0	4020	0.7428
0.3284	5.0	5025	0.7681
0.321	6.0	6030	0.7842
0.314	7.0	7035	0.7991
0.3088	8.0	8040	0.8021
0.3053	9.0	9045	0.8199
0.3019	10.0	10050	0.8173

Evaluation

Testing Data

ETTh1/test split. Train/validation/test splits are shown in the demo.

Metrics

Mean Squared Error (MSE).

Results

It achieves a MSE of 0.3881 on the evaluation dataset.

Hardware

1 NVIDIA A100 GPU

Framework versions

Transformers 4.36.0.dev0
Pytorch 2.0.1
Datasets 2.14.4
Tokenizers 0.14.1

📄 License

This model is licensed under the apache - 2.0 license.

Citation

BibTeX:

@misc{nie2023time,
      title={A Time Series is Worth 64 Words: Long-term Forecasting with Transformers}, 
      author={Yuqi Nie and Nam H. Nguyen and Phanwadee Sinthong and Jayant Kalagnanam},
      year={2023},
      eprint={2211.14730},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

APA:

Nie, Y., Nguyen, N., Sinthong, P., & Kalagnanam, J. (2023). A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. arXiv preprint arXiv:2211.14730.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご