A few notes on time-series forecasting with AutoGluon

I’ve been experimenting with using AutoGluon for time-series forecasting. In a previous post I have a brief overview of AutoGluon and a detailed look at its forecasting accuracy. In this post are a few notes about basic things that I’ve learned, in case it’s useful to anyone else. These notes are based on my experience so far with AG version 1.0.0.

So far I have just experimented with what AG calls “local” time-series models. These are univariate forecasting models that forecast a time-series only as a function of its own historic values (you can forecast multiple time-series simultaneously, but each is treated independently of the others). As of version 1.0.0, the available local models include automatically-selected ETS and ARIMA models, as well as naive (constant) forecasts and simple averages. A full list of models is available in the model zoo.

In contrast, AG’s “global” models combine information from all time-series in your dataset and forecast them together, potentially reflecting relationships among them. In AG, the “deep learning” models such as DeepAR and PatchTST are global models. At this stage it does not offer multivariate time-series regression models (e.g. vector autoregression models).

1. Beware the cache

By default, AG caches trained models and their predictions. This is can be useful for models that take a long time to train. However, if you are experimenting with changing model options when training models (e.g. the maximum number of lags in an ARIMA model) without changing the training data, this does not appear to cause the cache to be invalidated, and the predictions will not change. To avoid this, you can either set cache_predictions = False when creating a TimeSeriesPredictor, or set use_cache = False when calling TimeSeriesPredictor.predict(). Or just delete the cache directory before fitting models.

2. Choice of evaluation metric can affect the forecasts

When you create a TimeSeriesPredictor, the eval_metric option specifies which measure of forecast accuracy will be used to evaluate models. There are a bunch of options, including metrics of point forecast accuracy and metrics of probabalistic (quantile) forecast accuracy. Or you can define your own metric. Importantly, the type of metric that you use can affect what AG chooses as the "mean" forecast from TimeSeriesPredictor.predict(). For example, an AverageModel will use the mean of values in the training data if eval_metric=RMSE, but will use the median of the training values if eval_metric=MASE. This is probably not important, but it confused me for a while.

3. “Local” models use all available data to produce forecasts

The usual process for developing time-series models and forecasts goes something like this:

  1. Separate the available actual data into training and test sets, i.e. hold back some of the most recent actual data points, and use the rest for training models.
  2. Evaluate models against the test data, and choose the best one(s), optimise model parameters, etc.
  3. Re-fit the models using all available actual data, i.e. the training and test sets combined.
  4. Generate out-of-sample forecasts using the models fitted with all available data.

The re-fit step is important because, in a time-series, the most recent data points are the most informative about what the future path will be, and we want the models that we use to generate forecasts to reflect those points. AG’s documentation is a bit unclear about this point. It says:

When we fit the predictor with predictor.fit(train_data=train_data), under the hood AutoGluon further splits the original dataset train_data into train and validation parts.

So it does the train/test split when training models, but then does it do the re-fit with all available data when generating out-of-sample predictions? Well, there is a method TimeSeriesPredictor.refit_full() which claims to do exactly this, but as of version 1.0.0 it comes with a warning:

This is experimental functionality, many time series models do not yet support refit_full and will simply be copied.

I haven’t yet been able to figure out which models do and do not support this re-fitting. However, as I understand it from this discussion, any “local” models (e.g. simple statistical models like ETS and ARIMA) are trained on the full historic data in the call to TimeSeriesPredictor.predict(), immediately before predictions are generated. From some experiments I’ve tried, this seems to be correct. So I guess re-fitting is only potentially an issue with the “global” type models.

4. ARIMA models can’t use exogenous predictors

In an ARIMA model, it is common to include exogenous explanatory variables as predictors (AG calls these “known covariates”). Unfortunately, in version 1.0.0 the AutoARIMA model does not support this, and is limited to pure time-series models that are forecast only based on their own past values. Similarly, it doesn’t seem to be possible to create a simple linear time-series regression model that forecasts a dependent variable based on one or more explanatory variables.