đ PatchTST model pre-trained on ETTh1 dataset
PatchTST
is a transformer-based model for time series modeling tasks, including forecasting, regression, and classification. This pre-trained model on the ETTh1
dataset offers high - quality performance for time series forecasting.
đ Quick Start
To start training and evaluating a PatchTST
model, you can refer to this demo notebook.
⨠Features
- Transformer - based: Ideal for time series tasks like forecasting, regression, and classification.
- Pre - trained on ETTh1: Covers all seven channels of the
ETTh1
dataset.
- High - performance: Achieves a Mean Squared Error (MSE) of 0.3881 on the
test
split of the ETTh1
dataset when forecasting 96 hours into the future with a 512 - hour historical data window.
đ Documentation
Model Details
Model Description
The PatchTST
model was proposed in A Time Series is Worth 64 Words: Long - term Forecasting with Transformers by Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam.
At a high level, the model vectorizes time series into patches of a given size and encodes the resulting sequence of vectors via a Transformer. Then, it outputs the prediction length forecast via an appropriate head.
The model is based on two key components:
(i) Segmentation of time series into subseries - level patches, which serve as input tokens to the Transformer.
(ii) Channel - independence, where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series.
The patching design has three - fold benefits: local semantic information is retained in the embedding; the computation and memory usage of the attention maps are quadratically reduced given the same look - back window; and the model can attend to longer history. The channel - independent patch time series Transformer (PatchTST) can significantly improve long - term forecasting accuracy compared with SOTA Transformer - based models.
In addition, PatchTST has a modular design to seamlessly support masked time series pre - training as well as direct time series forecasting, classification, and regression.

Model Sources
Uses
This pre - trained model can be used for fine - tuning or evaluation on any Electrical Transformer dataset with the same channels as the ETTh1
dataset, specifically: HUFL, HULL, MUFL, MULL, LUFL, LULL, OT
. The model predicts the next 96 hours based on 512 hours of preceding input values. It is crucial to normalize the data. For more details on data pre - processing, please refer to the paper or the demo.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
No code examples are provided in the original document, so this section is skipped.
đ§ Technical Details
Training Details
Training Data
ETTh1
/train split. Train/validation/test splits are shown in the demo.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e - 05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- num_epochs: 10
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
0.4306 |
1.0 |
1005 |
0.7268 |
0.3641 |
2.0 |
2010 |
0.7456 |
0.348 |
3.0 |
3015 |
0.7161 |
0.3379 |
4.0 |
4020 |
0.7428 |
0.3284 |
5.0 |
5025 |
0.7681 |
0.321 |
6.0 |
6030 |
0.7842 |
0.314 |
7.0 |
7035 |
0.7991 |
0.3088 |
8.0 |
8040 |
0.8021 |
0.3053 |
9.0 |
9045 |
0.8199 |
0.3019 |
10.0 |
10050 |
0.8173 |
Evaluation
Testing Data
ETTh1
/test split. Train/validation/test splits are shown in the demo.
Metrics
Mean Squared Error (MSE).
Results
It achieves a MSE of 0.3881 on the evaluation dataset.
Hardware
1 NVIDIA A100 GPU
Framework versions
- Transformers 4.36.0.dev0
- Pytorch 2.0.1
- Datasets 2.14.4
- Tokenizers 0.14.1
đ License
This model is licensed under the apache - 2.0
license.
Citation
BibTeX:
@misc{nie2023time,
title={A Time Series is Worth 64 Words: Long-term Forecasting with Transformers},
author={Yuqi Nie and Nam H. Nguyen and Phanwadee Sinthong and Jayant Kalagnanam},
year={2023},
eprint={2211.14730},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
APA:
Nie, Y., Nguyen, N., Sinthong, P., & Kalagnanam, J. (2023). A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. arXiv preprint arXiv:2211.14730.