Mixed-data sampling (MIDAS) Transformer

class darts.dataprocessing.transformers.midas.MIDAS(low_freq, strip=True, drop_static_covariates=False, name='MIDAS', n_jobs=1, verbose=False)[source]

Bases: FittableDataTransformer, InvertibleDataTransformer

Mixed-data sampling transformer.

A transformer that converts higher frequency time series to lower frequency using mixed-data sampling; see [1] for further details. This allows higher frequency covariates to be used whilst forecasting a lower frequency target series. For example, using monthly inputs to forecast a quarterly target.

Notes

The high input frequency should always relate in the same rate to the low target frequency. For example, there’s always three months in quarter. However, the number of days in a month varies per month. In the latter case a MIDAS transformation does not work and the transformer will raise an error.

For anchored low frequency, the transformed series must contain at least 2 samples in order to be able to retrieve the original time index.

Parameters
  • low_freq (str) – The pd.DateOffset string alias corresponding to the target low frequency [2]. Passed on to the rule parameter of pandas.DataFrame.resample().

  • strip (bool) – Whether to remove the NaNs from the start and the end of the transformed series.

  • drop_static_covariates (bool) – If set to True, the statics covariates of the input series won’t be transferred to the output. This migth be useful for multivariate series with component-specific static covariates.

  • name (str) – A specific name for the scaler

  • n_jobs (int) – The number of jobs to run in parallel. Parallel jobs are created only when a Sequence[TimeSeries] is passed as input to a method, parallelising operations regarding different TimeSeries. Defaults to 1 (sequential). Setting the parameter to -1 means using all the available processors. Note: for a small amount of data, the parallelisation overhead could end up increasing the total required amount of time.

  • verbose (bool) – Optionally, whether to print operations progress

Examples

>>> from darts.datasets import AirPassengersDataset
>>> from darts.dataprocessing.transformers import MIDAS
>>> monthly_series = AirPassengersDataset().load()
>>> print(monthly_series.time_index[:4])
DatetimeIndex(['1949-01-01', '1949-02-01', '1949-03-01', '1949-04-01'], dtype='datetime64[ns]',
name='Month', freq='MS')
>>> print(monthly_series.values()[:4])
[[112.], [118.], [132.], [129.]]
>>> midas = MIDAS(low_freq="QS")
>>> quarterly_series = midas.fit_transform(monthly_series)
>>> print(quarterly_series.time_index[:3])
DatetimeIndex(['1949-01-01', '1949-04-01', '1949-07-01'], dtype='datetime64[ns]', name='Month', freq='QS-JAN')
>>> print(quarterly_series.values()[:3])
[[112. 118. 132.], [129. 121. 135.], [148. 148. 136.]]
>>> inversed_quaterly = midas.inverse_transform(quarterly_series)
>>> print(inversed_quaterly.time_index[:4])
DatetimeIndex(['1949-01-01', '1949-02-01', '1949-03-01', '1949-04-01'], dtype='datetime64[ns]',
name='time', freq='MS')
>>> print(inversed_quaterly.values()[:4])
[[112.], [118.], [132.], [129.]]

References

1

https://en.wikipedia.org/wiki/Mixed-data_sampling

2

https://pandas.pydata.org/docs/user_guide/timeseries.html#dateoffset-objects

Attributes

name

Name of the data transformer.

Methods

apply_component_mask(series[, ...])

Extracts components specified by component_mask from series

fit(series, *args[, component_mask])

Fits transformer to a (sequence of) TimeSeries by calling the user-implemented ts_fit method.

fit_transform(series, *args[, component_mask])

Fit the transformer to the (sequence of) series and return the transformed input.

inverse_transform(series, *args[, ...])

Inverse transforms a (sequence of) series by calling the user-implemented ts_inverse_transform method.

set_n_jobs(value)

Set the number of processors to be used by the transformer while processing multiple TimeSeries.

set_verbose(value)

Set the verbosity status.

stack_samples(vals)

Creates an array of shape (n_timesteps * n_samples, n_components) from either a TimeSeries or the array_values of a TimeSeries.

transform(series, *args[, component_mask])

Transforms a (sequence of) of series by calling the user-implemeneted ts_transform method.

ts_fit(series, params, *args, **kwargs)

MIDAS needs the high frequency period name in order to easily reverse_transform TimeSeries, the parallelization is handled by transform and/or inverse_transform (see InvertibleDataTransformer.__init__() docstring).

ts_inverse_transform(series, params)

Transforms series back to high frequency by retrieving the original high frequency and reshaping the values.

ts_transform(series, params)

Transforms series from high to low frequency using a mixed-data sampling approach.

unapply_component_mask(series, vals[, ...])

Adds back components previously removed by component_mask in apply_component_mask method.

unstack_samples(vals[, n_timesteps, ...])

Reshapes the 2D array returned by stack_samples back into an array of shape (n_timesteps, n_components, n_samples); this 'undoes' the reshaping of stack_samples.

static apply_component_mask(series, component_mask=None, return_ts=False)

Extracts components specified by component_mask from series

Parameters
  • series (TimeSeries) – input TimeSeries to be fed into transformer.

  • component_mask (Optional[ndarray]) – Optionally, np.ndarray boolean mask of shape (n_components, 1) specifying which components to extract from series. The i`th component of `series is kept only if component_mask[i] = True. If not specified, no masking is performed.

  • return_ts (bool) – Optionally, specifies that a TimeSeries should be returned, rather than an np.ndarray.

Returns

TimeSeries (if return_ts = True) or np.ndarray (if return_ts = False) with only those components specified by component_mask remaining.

Return type

masked

fit(series, *args, component_mask=None, **kwargs)

Fits transformer to a (sequence of) TimeSeries by calling the user-implemented ts_fit method.

The fitted parameters returned by ts_fit are stored in the self._fitted_params attribute. If a Sequence[TimeSeries] is passed as the series data, then one of two outcomes will occur:

1. If the global_fit attribute was set to False, then a different set of parameters will be individually fitted to each TimeSeries in the Sequence. In this case, this function automatically parallelises this fitting process over all of the multiple TimeSeries that have been passed. 2. If the global_fit attribute was set to True, then all of the TimeSeries objects will be used fit a single set of parameters.

Parameters
  • series (Union[TimeSeries, Sequence[TimeSeries]]) – (sequence of) series to fit the transformer on.

  • args – Additional positional arguments for the ts_fit() method

  • component_mask (Optional[np.ndarray] = None) – Optionally, a 1-D boolean np.ndarray of length series.n_components that specifies which components of the underlying series the transform should be fitted to.

  • kwargs – Additional keyword arguments for the ts_fit() method

Returns

Fitted transformer.

Return type

FittableDataTransformer

fit_transform(series, *args, component_mask=None, **kwargs)

Fit the transformer to the (sequence of) series and return the transformed input.

Parameters
  • series (Union[TimeSeries, Sequence[TimeSeries]]) – the (sequence of) series to transform.

  • args – Additional positional arguments passed to the ts_transform() and ts_fit() methods.

  • component_mask (Optional[np.ndarray] = None) – Optionally, a 1-D boolean np.ndarray of length series.n_components that specifies which components of the underlying series the transform should be fitted and applied to.

  • kwargs – Additional keyword arguments passed to the ts_transform() and ts_fit() methods.

Returns

Transformed data.

Return type

Union[TimeSeries, Sequence[TimeSeries]]

inverse_transform(series, *args, component_mask=None, **kwargs)

Inverse transforms a (sequence of) series by calling the user-implemented ts_inverse_transform method.

In case a sequence or list of lists is passed as input data, this function takes care of parallelising the transformation of multiple series in the sequence at the same time. Additionally, if the mask_components attribute was set to True when instantiating InvertibleDataTransformer, then any provided component_mask`s will be automatically applied to each input `TimeSeries; please refer to ‘Notes’ for further details on component masking.

Any additionally specified *args and **kwargs are automatically passed to ts_inverse_transform.

Parameters
  • series (Union[TimeSeries, Sequence[TimeSeries], Sequence[Sequence[TimeSeries]]]) – The series to inverse-transform. If a single TimeSeries, returns a single series. If a sequence of TimeSeries, returns a list of series. The series should be in the same order as the sequence used to fit the transformer. If a list of lists of TimeSeries, returns a list of lists of series. This can for example be the output of ForecastingModel.historical_forecasts() when using multiple series. Each inner list should contain TimeSeries related to the same series. The order of inner lists should be the same as the sequence used to fit the transformer.

  • args – Additional positional arguments for the ts_inverse_transform() method

  • component_mask (Optional[np.ndarray] = None) – Optionally, a 1-D boolean np.ndarray of length series.n_components that specifies which components of the underlying series the inverse transform should consider.

  • kwargs – Additional keyword arguments for the ts_inverse_transform() method

Returns

Inverse transformed data.

Return type

Union[TimeSeries, List[TimeSeries], List[List[TimeSeries]]]

Notes

If the mask_components attribute was set to True when instantiating InvertibleDataTransformer, then any provided component_mask`s will be automatically applied to each `TimeSeries input to transform; component_mask`s are simply boolean arrays of shape `(series.n_components,) that specify which components of each series should be transformed using ts_inverse_transform and which components should not. If component_mask[i] is True, then the i`th component of each `series will be transformed by ts_inverse_transform. Conversely, if component_mask[i] is False, the i`th component will be removed from each `series before being passed to ts_inverse_transform; after transforming this masked series, the untransformed i`th component will be ‘added back’ to the output. Note that automatic `component_mask`ing can only be performed if the `ts_inverse_transform does not change the number of timesteps in each series; if this were to happen, then the transformed and untransformed components are unable to be concatenated back together along the component axis.

If mask_components was set to False when instantiating InvertibleDataTransformer, then any provided component_masks will be passed as a keyword argument ts_inverse_transform; the user can then manually specify how the component_mask should be applied to each series.

property name

Name of the data transformer.

set_n_jobs(value)

Set the number of processors to be used by the transformer while processing multiple TimeSeries.

Parameters

value (int) – New n_jobs value. Set to -1 for using all the available cores.

set_verbose(value)

Set the verbosity status.

True for enabling the detailed report about scaler’s operation progress, False for no additional information.

Parameters

value (bool) – New verbosity status

static stack_samples(vals)

Creates an array of shape (n_timesteps * n_samples, n_components) from either a TimeSeries or the array_values of a TimeSeries.

Each column of the returned array corresponds to a component (dimension) of the series and is formed by concatenating all of the samples associated with that component together. More specifically, the i`th column is formed by concatenating `[component_i_sample_1, component_i_sample_2, …, component_i_sample_n].

Stacking is useful when implementing a transformation that applies the exact same change to every timestep in the timeseries. In such cases, the samples of each component can be stacked together into a single column, and the transformation can then be applied to each column, thereby ‘vectorising’ the transformation over all samples of that component; the unstack_samples method can then be used to reshape the output. For transformations that depend on the time_index or the temporal ordering of the observations, stacking should not be employed.

Parameters

vals (Union[ndarray, TimeSeries]) – Timeseries or np.ndarray of shape (n_timesteps, n_components, n_samples) to be ‘stacked’.

Returns

np.ndarray of shape (n_timesteps * n_samples, n_components), where the i`th column is formed by concatenating all of the samples of the `i`th component in `vals.

Return type

stacked

transform(series, *args, component_mask=None, **kwargs)

Transforms a (sequence of) of series by calling the user-implemeneted ts_transform method.

In case a Sequence[TimeSeries] is passed as input data, this function takes care of parallelising the transformation of multiple series in the sequence at the same time. Additionally, if the mask_components attribute was set to True when instantiating BaseDataTransformer, then any provided component_mask`s will be automatically applied to each input `TimeSeries; please refer to ‘Notes’ for further details on component masking.

Any additionally specified *args and **kwargs are automatically passed to ts_transform.

Parameters
  • series (Union[TimeSeries, Sequence[TimeSeries]]) – (sequence of) series to be transformed.

  • args – Additional positional arguments for each ts_transform() method call

  • component_mask (Optional[np.ndarray] = None) – Optionally, a 1-D boolean np.ndarray of length series.n_components that specifies which components of the underlying series the transform should consider. If the mask_components attribute was set to True when instantiating BaseDataTransformer, then the component mask will be automatically applied to each TimeSeries input. Otherwise, component_mask will be provided as an addition keyword argument to ts_transform. See ‘Notes’ for further details.

  • kwargs – Additional keyword arguments for each ts_transform() method call

Returns

Transformed data.

Return type

Union[TimeSeries, List[TimeSeries]]

Notes

If the mask_components attribute was set to True when instantiating BaseDataTransformer, then any provided component_mask`s will be automatically applied to each `TimeSeries input to transform; component_mask`s are simply boolean arrays of shape `(series.n_components,) that specify which components of each series should be transformed using ts_transform and which components should not. If component_mask[i] is True, then the i`th component of each `series will be transformed by ts_transform. Conversely, if component_mask[i] is False, the i`th component will be removed from each `series before being passed to ts_transform; after transforming this masked series, the untransformed i`th component will be ‘added back’ to the output. Note that automatic `component_mask`ing can only be performed if the `ts_transform does not change the number of timesteps in each series; if this were to happen, then the transformed and untransformed components are unable to be concatenated back together along the component axis.

If mask_components was set to False when instantiating BaseDataTransformer, then any provided component_masks will be passed as a keyword argument ts_transform; the user can then manually specify how the component_mask should be applied to each series.

static ts_fit(series, params, *args, **kwargs)[source]

MIDAS needs the high frequency period name in order to easily reverse_transform TimeSeries, the parallelization is handled by transform and/or inverse_transform (see InvertibleDataTransformer.__init__() docstring).

Return type

Union[Dict[str, Any], List[Dict[str, Any]]]

static ts_inverse_transform(series, params)[source]

Transforms series back to high frequency by retrieving the original high frequency and reshaping the values.

When converting to/from anchorable offset [1], the index is rolled backward if the series does not start on the anchor date to preserve all the values.

Steps:
  1. Reshape the values to flatten the components introduced by the transform

  2. Eliminate the rows filled with NaNs, to facilitate time index adjustment

  3. Retrieve the original components name

  4. When applicable, shift the time index start back in time

  5. Generate a new time index with the high frequency

References

1

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#anchored-offsets

Return type

TimeSeries

static ts_transform(series, params)[source]

Transforms series from high to low frequency using a mixed-data sampling approach. Uses and relies on pandas.DataFrame.resample.

When converting to/from anchorable offset [1]_, the index is rolled backward if the series does not start on the anchor date to preserve all the values.

Steps:
  1. Transform series to pd.DataFrame and get frequency string for PeriodIndex

  2. Downsample series and then upsample it again

  3. Replace input series by unsampled series if it’s not ‘full’

  4. Transform every column of the high frequency series into multiple columns for the low frequency series

  5. Transform the low frequency series back into a TimeSeries

Return type

TimeSeries

static unapply_component_mask(series, vals, component_mask=None)

Adds back components previously removed by component_mask in apply_component_mask method.

Parameters
  • series (Union[TimeSeries, Sequence[TimeSeries]]) – input TimeSeries that was fed into transformer.

  • vals (Union[ndarray, Sequence[ndarray], TimeSeries, Sequence[TimeSeries]]) – np.ndarray or TimeSeries to ‘unmask’

  • component_mask (Optional[ndarray]) – Optionally, np.ndarray boolean mask of shape (n_components, 1) specifying which components were extracted from series. If given, insert vals back into the columns of the original array. If not specified, nothing is ‘unmasked’.

Returns

TimeSeries (if vals is a TimeSeries) or np.ndarray (if vals is an np.ndarray) with those components previously removed by component_mask now ‘added back’.

Return type

unmasked

static unstack_samples(vals, n_timesteps=None, n_samples=None, series=None)

Reshapes the 2D array returned by stack_samples back into an array of shape (n_timesteps, n_components, n_samples); this ‘undoes’ the reshaping of stack_samples. Either n_components, n_samples, or series must be specified.

Parameters
  • vals (ndarray) – np.ndarray of shape (n_timesteps * n_samples, n_components) to be ‘unstacked’.

  • n_timesteps (Optional[int]) – Optionally, the number of timesteps in the array originally passed to stack_samples. Does not need to be provided if series is specified.

  • n_samples (Optional[int]) – Optionally, the number of samples in the array originally passed to stack_samples. Does not need to be provided if series is specified.

  • series (Optional[TimeSeries]) – Optionally, the TimeSeries object used to create vals; n_samples is inferred from this.

Returns

np.ndarray of shape (n_timesteps, n_components, n_samples).

Return type

unstacked