Ensembling Models

The following is a brief demonstration of the ensemble models in Darts. Starting from the examples provided in the Quickstart Notebook, some advanced features and subtilities will be detailed.

The following topics are covered in this notebook : * Basics & references * Naive Ensembling * Deterministic * Covariates & multivariate series * Probabilistic * Learned Ensembling * Deterministic * Probabilistic * Bootstraping * Pre-trained Ensembling

# fix python path if working locally
from utils import fix_pythonpath_if_working_locally

%load_ext autoreload
%autoreload 2
%matplotlib inline
import matplotlib.pyplot as plt

from darts.models import (
from darts.metrics import mape
from darts.datasets import AirPassengersDataset
from darts.utils.timeseries_generation import (
    datetime_attribute_timeseries as dt_attr,
from darts.dataprocessing.transformers import Scaler

import warnings


import logging


Basics & references

Ensembling combines the forecasts of several “weak” models to obtain a more robust and accurate model.

All of Darts’ ensembling models rely on the stacking technique (reference). They provide the same functionalities as the other forecasting models. Depending on the ensembled models, they can:

  • levarage covariates

  • be trained on multiple-series

  • predict multivariate targets

  • generate probabilistic forecasts

  • and more…

# using the AirPassenger dataset, directly available in darts
ts_air = AirPassengersDataset().load()
<Axes: xlabel='Month'>

Naive Ensembling

Naive ensembling simply takes the average (mean) of the forecasts generated by the ensembled forecasting models. Darts’ NaiveEnsembleModel accepts both local and global forecasting models (as well as combination of the two, with some additional limitations).

naive_ensemble = NaiveEnsembleModel(
    forecasting_models=[NaiveSeasonal(K=12), NaiveDrift()]

backtest = naive_ensemble.historical_forecasts(ts_air, start=0.6, forecast_horizon=3)

print("NaiveEnsemble (naive) MAPE:", round(mape(backtest, ts_air), 5))
NaiveEnsemble (naive) MAPE: 11.87818

Note: after looking at the each model’s MAPE, one would notice that NaiveSeasonal is actually performing better on its own than ensembled with NaiveDrift. Checking the performance of the single models is in general a good practice before defining an ensemble.

Before creating the new NaiveEnsemble, we will screen models to identify which ones would do well together. The candidates are : - LinearRegressionModel : classic and simple model - ExponentialSmoothing : moving window model - KalmanForecaster : a filter-based model - RandomForest : decision trees model

candidates_models = {
    "LinearRegression": (LinearRegressionModel, {"lags": 12}),
    "ExponentialSmoothing": (ExponentialSmoothing, {}),
    "KalmanForecaster": (KalmanForecaster, {"dim_x": 12}),
    "RandomForest": (RandomForest, {"lags": 12, "random_state": 0}),

backtest_models = []

for model_name, (model_cls, model_kwargs) in candidates_models.items():
    model = model_cls(**model_kwargs)
        model.historical_forecasts(ts_air, start=0.6, forecast_horizon=3)
    print(f"{model_name} MAPE: {round(mape(backtest_models[-1], ts_air), 5)}")
LinearRegression MAPE: 4.64008
ExponentialSmoothing MAPE: 4.44874
KalmanForecaster MAPE: 4.5539
RandomForest MAPE: 8.02264
fix, axes = plt.subplots(2, 2, figsize=(9, 6))
for ax, backtest, model_name in zip(
    ts_air[-len(backtest) :].plot(ax=ax, label="ground truth")
    backtest.plot(ax=ax, label=model_name)

    ax.set_ylim([250, 650])

The historical forecasts obtained with the LinearRegressionModel and KalmanForecaster look quite similar whereas ExponentialSmoothing tends to understimate the true values and RandomForest is failing to capture the peaks. To benefits from the ensemble, we will favor diversity and continue with the LinearRegressionModel and ExponentialSmoothing.

ensemble = NaiveEnsembleModel(
    forecasting_models=[LinearRegressionModel(lags=12), ExponentialSmoothing()]

backtest = ensemble.historical_forecasts(ts_air, start=0.6, forecast_horizon=3)

ts_air[-len(backtest) :].plot(label="series")
plt.ylim([250, 650])
print("NaiveEnsemble (v2) MAPE:", round(mape(backtest, ts_air), 5))
NaiveEnsemble (v2) MAPE: 4.04297

Compared to the individual model MAPE score, 4.64008 for the LinearRegressionModel and 4.44874 for the ExponentialSmoothing, the ensembling improved the accuracy to 4.04297!

Using covariates & predicting multivariate series

Depending on the forecasting models used, the EnsembleModel can of course also leverage covariates or forecast multivariates series! The covariates will be passed only to the forecasting models supporting them.

In the example below, the ExponentialSmoothing model does not support any covariates whereas the LinearRegressionModel model supports future_covariates.

ensemble = NaiveEnsembleModel(
    [LinearRegressionModel(lags=12, lags_future_covariates=[0]), ExponentialSmoothing()]

# encoding the months as integer, normalised
future_cov = dt_attr(ts_air.time_index, "month", add_length=12) / 12
backtest = ensemble.historical_forecasts(
    ts_air, future_covariates=future_cov, start=0.6, forecast_horizon=3

ts_air[-len(backtest) :].plot(label="series")
plt.ylim([250, 650])
print("NaiveEnsemble (w/ future covariates) MAPE:", round(mape(backtest, ts_air), 5))
NaiveEnsemble (w/ future covariates) MAPE: 4.07502

Probabilistic naive ensembling

Combining models supporting probabilistic forecasts results in a probabilistic NaiveEnsembleModel! We can easily tweak the models used above to make them probabilistic and obtain confidence interval in ours forecasts:

ensemble_probabilistic = NaiveEnsembleModel(
            quantiles=[0.05, 0.5, 0.95],

# must pass num_samples > 1 to obtain a probabilistic forecasts
backtest = ensemble_probabilistic.historical_forecasts(
    ts_air, start=0.6, forecast_horizon=3, num_samples=100

ts_air[-len(backtest) :].plot(label="ground truth")
<Axes: xlabel='time'>

Learned Ensembling

Ensembling can also be considered as a supervised regression problem: given a set of forecasts (features), find a model that combines them in order to minimise errors on the target. This is what the RegressionEnsembleModel does. The main three parameters are:

  • forecasting_models is a list of forecasting models whose predictions we want to ensemble.

  • regression_train_n_points is the number of time steps to use for fitting the “ensemble regression” model (i.e., the inner model that combines the forecasts).

  • regression_model is, optionally, a sklearn-compatible regression model or a Darts RegressionModel to be used for the ensemble regression. If not specified, Darts’ LinearRegressionModel is used. Using a sklearn model is easy out-of-the-box, but using one of Darts’ regression models allows to potentially take arbitrary lags of the individual forecasts as inputs of the regression model.

Once these elements are in place, a RegressionEnsembleModel can be used like a regular forecasting model:

ensemble_model = RegressionEnsembleModel(
    forecasting_models=[NaiveSeasonal(K=12), NaiveDrift()],

backtest = ensemble_model.historical_forecasts(ts_air, start=0.6, forecast_horizon=3)


print("RegressionEnsemble (naive) MAPE:", round(mape(backtest, ts_air), 5))
RegressionEnsemble (naive) MAPE: 4.85142

Compared to the MAPE of 11.87818 obtained at the beginning of the naive ensembling section, adding a LinearRegressionModel on top of the two naive models does improve the quality of the forecast.

Now, let’s see if we can observe similar gain when the RegressionEnsemble forecasting models are not naive:

ensemble = RegressionEnsembleModel(
    forecasting_models=[LinearRegressionModel(lags=12), ExponentialSmoothing()],

backtest = ensemble.historical_forecasts(ts_air, start=0.6, forecast_horizon=3)


print("RegressionEnsemble (v2) MAPE:", round(mape(backtest, ts_air), 5))
RegressionEnsemble (v2) MAPE: 4.63334

Interestingly, even if the MAPE improved compared to the RegressionEnsemble relying on naive models (MAPE: 4.85142), it does not outperform the NaiveEnsemble using similar forecasting models (MAPE: 4.04297).

This performance gap is partially caused by the points set aside to train the ensembling LinearRegression; the two forecasting models (LinearRegression and ExponentialSmoothing) cannot access the latest values of the series, which contains a marked upward trend.

Out of curiosity, we can use the Ridge regression model from the sklearn library to ensemble the forecasts:

from sklearn.linear_model import Ridge

ensemble = RegressionEnsembleModel(
    forecasting_models=[LinearRegressionModel(lags=12), ExponentialSmoothing()],

backtest = ensemble.historical_forecasts(ts_air, start=0.6, forecast_horizon=3)

print("RegressionEnsemble (Ridge) MAPE:", round(mape(backtest, ts_air), 5))
RegressionEnsemble (Ridge) MAPE: 6.46803

In this particuliar scenario, using a regression model with a regularization term deteriorated the forecasts but there might be other cases where it will improve them.

Training using historical forecasts

When predicting a number of values greater than their output_chunk_length, GlobalForecastingModels rely on auto-regression (use their own output as input) to forecast values far in the future. However, the quality of the forecasts can considerably decrease as the predicted timestamp get further from the end of the observations. During RegressionEnsemble’s regression model training, the forecasting models generate forecasts for timestamps where the ground truth is actually known and available, making it possible to use historical_forecasts instead of predict().

This can be activated with train_using_historical_forecasts=True.

Under the hood, the ensemble model will trigger historical forecasting for each model with forecast_horizon=model.output_chunk_length, stride=model.output_chunk_length, last_points_only=False, and overlap_end=False to predict the last regression_train_n_points points of the target series.

# replacing the ExponentialSmoothing (local) with RandomForest (global)
ensemble = RegressionEnsembleModel(
        RandomForest(lags=12, random_state=0),
backtest = ensemble.historical_forecasts(ts_air, start=0.6, forecast_horizon=3)

ensemble_hist_fct = RegressionEnsembleModel(
        RandomForest(lags=12, random_state=0),
backtest_hist_fct = ensemble_hist_fct.historical_forecasts(
    ts_air, start=0.6, forecast_horizon=3

print("RegressionEnsemble (no hist_fct) MAPE:", round(mape(backtest, ts_air), 5))
print("RegressionEnsemble (hist_fct) MAPE:", round(mape(backtest_hist_fct, ts_air), 5))
RegressionEnsemble (no hist_fct) MAPE: 5.7016
RegressionEnsemble (hist_fct) MAPE: 5.12017

As expected, using historical forecasts with the forecasting models to the train the regression model produces better forecasts.

Probabilistic regression ensemble

In order to be probabilistic, the RegressionEnsembleModel, must have a probabilistic ensembling regression model (see table in the README):

ensemble = RegressionEnsembleModel(
    forecasting_models=[LinearRegressionModel(lags=12), ExponentialSmoothing()],
        lags_future_covariates=[0], likelihood="quantile", quantiles=[0.05, 0.5, 0.95]

backtest = ensemble.historical_forecasts(
    ts_air, start=0.6, forecast_horizon=3, num_samples=100

ts_air[-len(backtest) :].plot(label="ground truth")

print("RegressionEnsemble (probabilistic) MAPE:", round(mape(backtest, ts_air), 5))
RegressionEnsemble (probabilistic) MAPE: 5.15071

Bootstrapping regression ensemble

When the forecasting models of a RegressionEnsembleModel are probabilistic, the samples dimension of their forecasts is reduced and used as covariates for the ensembling regression. Since the ensembling regression model is deterministic, the generated forecasts is deterministic as well.

ensemble = RegressionEnsembleModel(
            lags=12, likelihood="quantile", quantiles=[0.05, 0.5, 0.95]

backtest = ensemble.historical_forecasts(ts_air, start=0.6, forecast_horizon=3)

ts_air[-len(backtest) :].plot(label="ground truth")

print("RegressionEnsemble (bootstrap) MAPE:", round(mape(backtest, ts_air), 5))
RegressionEnsemble (bootstrap) MAPE: 5.10138

Pre-trained Ensembling

As both NaiveEnsembleModel and RegressionEnsembleModel accept GlobalForecastingModel as forecasting models, they can be used to ensemble pre-trained deep learning and regression models. Note that this functionnality is only supported if all the ensembled forecasting models are instances from the GlobalForecastingModel class and are already trained when creating the ensemble.

Disclaimer : Be careful not to pre-train the models with data used during validation as this would introduce considerable bias.

Note : The parameters for the TCNModel is heavily inspired from the TCNModel example notebook.

# holding out values for validation
train, val = ts_air.split_after(0.8)

# scaling the target
scaler = Scaler()
train = scaler.fit_transform(train)
val = scaler.transform(val)

# use the month as a covariate
month_series = dt_attr(ts_air.time_index, attribute="month", one_hot=True)
scaler_month = Scaler()
month_series = scaler_month.fit_transform(month_series)

# training a regular linear regression, without any covariates
linreg_model = LinearRegressionModel(lags=24)

# instanciating a TCN model with parameters optimized for the AirPassenger dataset
tcn_model = TCNModel(
tcn_model.fit(train, past_covariates=month_series)
TCNModel(kernel_size=5, num_filters=3, num_layers=None, dilation_base=2, weight_norm=True, dropout=0.2, input_chunk_length=24, output_chunk_length=12, n_epochs=500, random_state=0)

As a sanity check, we will look at the forecast of the model taken individually:

# individual model forecasts
pred_linreg = linreg_model.predict(24)
pred_tcn = tcn_model.predict(24, verbose=False)

# scaling them back
pred_linreg_rescaled = scaler.inverse_transform(pred_linreg)
pred_tcn_rescaled = scaler.inverse_transform(pred_tcn)

# plotting
ts_air[-24:].plot(label="ground truth")

Now that we have a good idea of the individual performance of each of these models, we can ensemble them. We must make sure to set retrain_forecasting_models=False or the ensemble will need to be fitted before being able to call predict().

Advice : Use the save() method to export your model and keep a copy of your weights.

naive_ensemble = NaiveEnsembleModel(
    forecasting_models=[tcn_model, linreg_model], train_forecasting_models=False
# NaiveEnsemble initialized with pre-trained models can call predict() directly,
# the `series` argument must however be provided
pred_naive = naive_ensemble.predict(len(val), train)

pretrain_ensemble = RegressionEnsembleModel(
    forecasting_models=[tcn_model, linreg_model],
# RegressionEnsemble must train the ensemble model, even if the forecasting models are already trained
pred_ens = pretrain_ensemble.predict(len(val))

# scaling back the predictions
pred_naive_rescaled = scaler.inverse_transform(pred_naive)
pred_ens_rescaled = scaler.inverse_transform(pred_ens)

# plotting
plt.figure(figsize=(8, 5))
scaler.inverse_transform(val).plot(label="ground truth")
pred_naive_rescaled.plot(label="pre-trained NaiveEnsemble")
pred_ens_rescaled.plot(label="pre-trained RegressionEnsemble")
plt.ylim([250, 650])

print("LinearRegression MAPE:", round(mape(pred_linreg_rescaled, ts_air), 5))
print("TCNModel MAPE:", round(mape(pred_tcn_rescaled, ts_air), 5))
print("NaiveEnsemble (pre-trained) MAPE:", round(mape(pred_naive_rescaled, ts_air), 5))
    "RegressionEnsemble (pre-trained) MAPE:", round(mape(pred_ens_rescaled, ts_air), 5)
LinearRegression MAPE: 3.91311
TCNModel MAPE: 4.70491
NaiveEnsemble (pre-trained) MAPE: 3.82837
RegressionEnsemble (pre-trained) MAPE: 3.61749


Ensembling pre-trained LinearRegression and TCNModel models allowed us to out-perform single models and training a linear regression on top of these two models forecasts further improved the MAPE score.

If the gains remain limited on this small dataset, ensembling is a powerful technique that can yield impressive results and was notably used by the winners of the 4th edition of the Makridakis Competition (website, github repository).