Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Tiny Time Mixer (TTM) Research-Use Model Card
TinyTimeMixers (TTMs) are compact pre - trained models for Multivariate Time - Series Forecasting, open - sourced by IBM Research. With model sizes starting from 1M params, TTM (accepted in NeurIPS 24) introduces the concept of the first - ever “tiny” pre - trained models for Time - Series Forecasting. This model offers high - performance forecasting with minimal computational resources.
This model card contains the model - weights for research - use only and full reproducibility of our results published in our paper. If you are looking for TTM model weights for commercial and enterprise use, please refer to our granite releases [here](https://huggingface.co/ibm - granite/granite - timeseries - ttm - r2).
TTM outperforms several popular benchmarks demanding billions of parameters in zero - shot and few - shot forecasting. It is a lightweight forecaster, pre - trained on publicly available time series data with various augmentations. TTM provides state - of - the - art zero - shot forecasts and can easily be fine - tuned for multi - variate forecasts with just 5% of the training data to be competitive. Refer to our paper for more details.
The current open - source version supports point forecasting use - cases specifically ranging from minutely to hourly resolutions (e.g., 10 min, 15 min, 1 hour). Note that zeroshot, fine - tuning and inference tasks using TTM can easily be executed in 1 GPU machine or in laptops too!
🚀 Quick Start
To get started with TTM, you can follow the usage examples below. First, make sure you understand the model's capabilities and limitations.
✨ Features
- Focused Pre - trained Models: Each pre - trained TTM is tailored for a particular forecasting setting (governed by the context length and forecast length), resulting in more accurate results.
- Lightweight and Fast: With extremely small model sizes and high speed, TTM can be easily deployed without demanding a large amount of resources.
- State - of - the - Art Performance: Outperforms popular benchmarks in zero/few - shot forecasting while significantly reducing computational requirements.
- Multivariate Forecasting: Supports both channel independence and channel - mixing approaches, as well as exogenous and categorical data infusion.
📦 Installation
No specific installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
# Load Model from HF Model Hub mentioning the branch name in revision field
model = TinyTimeMixerForPrediction.from_pretrained(
"https://huggingface.co/ibm/TTM", revision="main"
)
# Do zeroshot
zeroshot_trainer = Trainer(
model=model,
args=zeroshot_forecast_args,
)
zeroshot_output = zeroshot_trainer.evaluate(dset_test)
Advanced Usage
# Freeze backbone and enable few - shot or finetuning:
# freeze backbone
for param in model.backbone.parameters():
param.requires_grad = False
finetune_forecast_trainer = Trainer(
model=model,
args=finetune_forecast_args,
train_dataset=dset_train,
eval_dataset=dset_val,
callbacks=[early_stopping_callback, tracking_callback],
optimizers=(optimizer, scheduler),
)
finetune_forecast_trainer.train()
fewshot_output = finetune_forecast_trainer.evaluate(dset_test)
📚 Documentation
Model Description
TTM falls under the category of “focused pre - trained models”. Instead of building one massive model supporting all forecasting settings, we construct smaller pre - trained models, each focusing on a specific forecasting setting. This approach ensures that our models remain extremely small and fast, facilitating easy deployment.
In this model card, we plan to release several pre - trained TTMs for common forecasting settings. We have also released our source code and pretraining scripts for users to pretrain models on their own. Pretraining TTMs is very easy and fast, taking less than a day compared to several days or weeks in traditional approaches.
Model Releases
- 512 - 96 - ft - r2: Given the last 512 time - points (context length), this model can forecast up to the next 96 time - points (forecast length) in the future. (branch name: main)
- 1024 - 96 - ft - r2: Given the last 1024 time - points (context length), this model can forecast up to the next 96 time - points (forecast length) in the future. (branch name: 1024 - 96 - ft - r2) [[Benchmarks]]
- 1536 - 96 - ft - r2: Given the last 1536 time - points (context length), this model can forecast up to the next 96 time - points (forecast length) in the future. (branch name: 1536 - 96 - ft - r2)
- There are also models released for forecast lengths up to 720 timepoints. The branch names for these are as follows:
512 - 192 - ft - r2
,1024 - 192 - ft - r2
,1536 - 192 - ft - r2
,512 - 336 - r2
,512 - 336 - ft - r2
,1024 - 336 - ft - r2
,1536 - 336 - ft - r2
,512 - 720 - ft - r2
,1024 - 720 - ft - r2
,1536 - 720 - ft - r2
- Use the [[get_model]](https://github.com/ibm - granite/granite - tsfm/blob/main/tsfm_public/toolkit/get_model.py) utility to automatically select the required model based on your input context length and forecast length requirement.
- Currently, 3 context lengths (512, 1024, and 1536) and 4 forecast lengths (96, 192, 336, 720) are allowed. Users need to provide one of the 3 allowed context lengths as input but can provide any forecast lengths up to 720 in
get_model()
to get the required model.
Benchmarks
TTM outperforms popular benchmarks such as TimesFM, Moirai, Chronos, Lag - Llama, Moment, GPT4TS, TimeLLM, LLMTime in zero/few - shot forecasting while significantly reducing computational requirements. Moreover, TTMs are lightweight and can be executed even on CPU - only machines, enhancing usability and fostering wider adoption in resource - constrained environments.
- TTM - B referred in the paper maps to the 512 context models.
- TTM - E referred in the paper maps to the 1024 context models.
- TTM - A referred in the paper maps to the 1536 context models.
Note that the Granite TTM models are pre - trained exclusively on datasets with clear commercial - use licenses approved by our legal team. As a result, the pre - training dataset used in this release differs slightly from the one used in the research paper, which may lead to minor variations in model performance compared to the published results.
Benchmarking Scripts: [here](https://github.com/ibm - granite/granite - tsfm/blob/main/notebooks/hfdemo/tinytimemixer/full_benchmarking/research - use - r2.sh)
Recommended Use
- Data Scaling: Users have to externally standard scale their data independently for every channel before feeding it to the model. Refer to TSP for data scaling.
- Resolution Support: The current open - source version supports only minutely and hourly resolutions (e.g., 10 min, 15 min, 1 hour). Other lower resolutions (e.g., weekly or monthly) are currently not supported as the model needs a minimum context length of 512 or 1024.
- Context Length: Enabling any upsampling or prepending zeros to virtually increase the context length for shorter - length datasets is not recommended as it will impact the model performance.
Model Details
For more details on TTM architecture and benchmarks, refer to our paper.
TTM - 1 currently supports 2 modes:
- Zeroshot forecasting: Directly apply the pre - trained model on your target data to get an initial forecast (with no training).
- Finetuned forecasting: Finetune the pre - trained model with a subset of your target data to further improve the forecast.
Since TTM models are extremely small and fast, it is very easy to finetune the model with your available target data in a few minutes to get more accurate forecasts.
The current release supports multivariate forecasting via both channel independence and channel - mixing approaches. Decoder Channel - Mixing can be enabled during fine - tuning for capturing strong channel - correlation patterns across time - series variates, a critical capability lacking in existing counterparts.
In addition, TTM also supports exogenous infusion and categorical data infusion.
Model Sources
- Repository: https://github.com/ibm - granite/granite - tsfm/tree/main/tsfm_public/models/tinytimemixer
- Paper: https://arxiv.org/pdf/2401.03955.pdf
Blogs and articles on TTM
Refer to our [wiki](https://github.com/ibm - granite/granite - tsfm/wiki)
🔧 Technical Details
The technical details are mainly covered in the paper. TTM's architecture and pre - training methods are designed to achieve high performance with small model sizes.
📄 License
The model is released under the cc - by - nc - sa - 4.0 license.
📖 Citation
Kindly cite the following paper if you intend to use our model or its associated architectures/approaches in your work:
@inproceedings{ekambaram2024tinytimemixersttms,
title={Tiny Time Mixers (TTMs): Fast Pre - trained Models for Enhanced Zero/Few - Shot Forecasting of Multivariate Time Series},
author={Vijay Ekambaram and Arindam Jati and Pankaj Dayama and Sumanta Mukherjee and Nam H. Nguyen and Wesley M. Gifford and Chandra Reddy and Jayant Kalagnanam},
booktitle={Advances in Neural Information Processing Systems (NeurIPS 2024)},
year={2024},
}
Model Card Authors
Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Wesley M. Gifford, Sumanta Mukherjee, Chandra Reddy and Jayant Kalagnanam
IBM Public Repository Disclosure
All content in this repository including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward.


