darts.utils.data.tabularization.add_static_covariates_to_lagged_data(features, target_series, uses_static_covariates=True, last_shape=None)[source]

Add static covariates to the features’ table for RegressionModels. If uses_static_covariates=True, all target series used in fit() and predict() must have static covariates with identical dimensionality. Otherwise, will not consider static covariates.

The static covariates are added to the right of the lagged features following the convention: with a 2 component series, and 2 static covariates per component -> scov_1_comp_1 | scov_1_comp_2 | scov_2_comp_1 | scov_2_comp_2

Parameters
  • features (Union[ndarray, Sequence[ndarray]]) – The features’ numpy array(s) to which the static covariates will be added. Can either be a lone feature matrix or a Sequence of feature matrices; in the latter case, static covariates will be appended to each feature matrix in this Sequence.

  • target_series (Union[TimeSeries, Sequence[TimeSeries]]) – The target series from which to read the static covariates.

  • uses_static_covariates (bool) – Whether the model uses/expects static covariates. If True, it enforces that static covariates must have identical shapes across all of target series.

  • last_shape (Optional[tuple[int, int], None]) – Optionally, the last observed shape of the static covariates. This is None before fitting, or when uses_static_covariates is False.

Returns

The features’ array(s) with appended static covariates columns. If the features input was passed as a Sequence of np.array`s, then a `Sequence is also returned; if features was passed as an np.array, a np.array is returned. last_shape is the shape of the static covariates.

Return type

(features, last_shape)

darts.utils.data.tabularization.create_lagged_component_names(target_series=None, past_covariates=None, future_covariates=None, lags=None, lags_past_covariates=None, lags_future_covariates=None, output_chunk_length=1, concatenate=True, use_static_covariates=False)[source]

Helper function called to retrieve the name of the features and labels arrays created with create_lagged_data(). The order of the features is the following:

Along the n_lagged_features axis, X has the following structure:

lagged_target | lagged_past_covariates | lagged_future_covariates | static covariates

For *_lags=[-2,-1] and *_series.n_components = 2 (lags shared across all the components), each lagged_* has the following structure (grouped by lags):

comp0_*_lag-2 | comp1_*_lag-2 | comp0_*_lag_-1 | comp1_*_lag-1

For *_lags={‘comp0’:[-3, -1], ‘comp1’:[-5, -3]} and *_series.n_components = 2 (component- specific lags), each lagged_* has the following structure (sorted by lags, then by components):

comp1_*_lag-5 | comp0_*_lag-3 | comp1_*_lag_-3 | comp0_*_lag-1

and for static covariates (2 static covariates acting on 2 target components):

cov0_*_target_comp0 | cov0_*_target_comp1 | cov1_*_target_comp0 | cov1_*_target_comp1

Along the n_lagged_labels axis, y has the following structure (for output_chunk_length=4 and target_series.n_components=2):

comp0_target_lag0 | comp1_target_lag0 | … | comp0_target_lag3 | comp1_target_lag3

Note : will only use the component names of the first series from target_series, past_covariates, future_covariates, and static_covariates.

The naming convention for target, past and future covariates lags is: "{name}_{type}_lag{i}", where:

  • {name} the component name of the (first) series

  • {type} is the feature type, one of “target”, “pastcov”, and “futcov”

  • {i} is the lag value

The naming convention for static covariates is: "{name}_statcov_target_{comp}", where:

  • {name} the static covariate name of the (first) series

  • {comp} the target component name of the (first) that the static covariate act on. If the static

    covariate acts globally on a multivariate target series, will show “global”.

The naming convention for labels is: "{name}_target_hrz{i}", where:

  • {name} the component name of the (first) series

  • {i} is the step in the forecast horizon

Return type

tuple[list[list[str]], list[list[str]]]

Returns

  • features_cols_name – The names of the lagged features in the X array generated by create_lagged_data() as a List[str]. If concatenate=True, also contains the columns names for the y array (on the right).

  • labels_cols_name

    The names of the lagged features in the y array generated by create_lagged_data()

    as a List[str].

See also

tabularization.create_lagged_data

generate the lagged features and labels as (list of) Arrays.

darts.utils.data.tabularization.create_lagged_data(target_series=None, past_covariates=None, future_covariates=None, lags=None, lags_past_covariates=None, lags_future_covariates=None, output_chunk_length=1, output_chunk_shift=0, uses_static_covariates=True, last_static_covariates_shape=None, max_samples_per_ts=None, multi_models=True, check_inputs=True, use_moving_windows=True, is_training=True, concatenate=True, sample_weight=None, show_warnings=True)[source]

Creates the features array X and labels array y to train a lagged-variables regression model (e.g. an sklearn model) when is_training = True; alternatively, creates the features array X to produce a series of prediction from an already-trained regression model when is_training = False. In both cases, a list of time indices corresponding to each generated observation is also returned.

Notes

Instead of calling create_lagged_data directly, it is instead recommended that:
  • create_lagged_training_data be called if one wishes to create the X and y arrays

to train a regression model. - create_lagged_prediction_data be called if one wishes to create the X array required to generate a prediction from an already-trained regression model.

This is because even though both of these functions are merely wrappers around create_lagged_data, their call signatures are more easily interpreted than create_lagged_data. For example, create_lagged_prediction_data does not accept output_chunk_length nor multi_models as inputs, since these inputs are not used when constructing prediction data. Similarly, create_lagged_prediction_data returns only X and times as outputs, as opposed to returning y as None along with X and times.

The X array is constructed from the lagged values of up to three separate timeseries:

1. The target_series, which contains the values we’re trying to predict. A regression model that uses previous values of the target its predicting is referred to as autoregressive; please refer to [1] for further details about autoregressive timeseries models. 2. The past covariates series, which contains values that are not known into the future. Unlike the target series, however, past covariates are not to be predicted by the regression model. 3. The future covariates (AKA ‘exogenous’ covariates) series, which contains values that are known into the future, even beyond the data in target_series and past_covariates.

See [2] for a more detailed discussion about target, past, and future covariates. Conversely, y is comprised only of the lagged values of target_series.

The shape of X is:

X.shape = (n_observations, n_lagged_features, n_samples),

where n_observations equals either the number of time points shared between all specified series, or max_samples_per_ts, whichever is smallest. The shape of y is:

y.shape = (n_observations, output_chunk_length, n_samples),

if multi_models = True, otherwise:

y.shape = (n_observations, 1, n_samples).

Along the n_lagged_features axis, X has the following structure (for *_lags=[-2,-1] and *_series.n_components = 2):

lagged_target | lagged_past_covariates | lagged_future_covariates

where each lagged_* has the following structure:

lag_-2_comp_1_* | lag_-2_comp_2_* | lag_-1_comp_1_* | lag_-1_comp_2_*

Along the n_lagged_labels axis, y has the following structure (for output_chunk_length=4 and target_series.n_components=2):

lag_+0_comp_1_target | lag_+0_comp_2_target | … | lag_+3_comp_1_target | lag_+3_comp_2_target

The lags and lags_past_covariates must contain only values less than or equal to -1. In other words, one cannot use the value of either of these series at time t to predict the value of the target series at the same time t; this is because the values of target_series and past_covariates at time t aren’t available at prediction time, by definition. Conversely, since the values of future_covariates are known into the future, lags_future_covariates can contain negative, positive, and/or zero lag values (i.e. we can use the values of future_covariates at time t or beyond to predict the value of target_series at time t).

The exact method used to construct X and y depends on whether all specified timeseries are of the same frequency or not:

  • If all specified timeseries are of the same frequency, strided_moving_window is used to extract

contiguous time blocks from each timeseries; the lagged variables are then extracted from each window. - If all specified timeseries are not of the same frequency, then find_shared_times is first used to find those times common to all three timeseries, after which the lagged features are extracted by offsetting the time indices of these common times by the requested lags.

In cases where it can be validly applied, the ‘moving window’ method is expected to be faster than the ‘intersecting time’ method. However, in exceptional cases where only a small number of lags are being extracted, but the difference between the lag values is large (e.g. lags = [-1, -1000]), the ‘moving window’ method is expected to consume significantly more memory, since it extracts all series values between the maximum and minimum lags as ‘windows’, before actually extracting the specific requested lag values.

In order for the lagged features of a series to be added to X, both that series and the corresponding lags must be specified; if a series is specified without the corresponding lags, that series will be ignored and not added to X. X and y arrays are constructed independently over the samples dimension (i.e. the second axis) of each series.

If the provided series are stochastic (i.e. series.n_components > 1), then an X and y array will be constructed for each sample; the arrays corresponding to each sample are concatenated together along the 2`nd axis of `X and y. In other words, create_lagged_data is vectorized over the sample axis of the target_series, past_covariates, and future_covariates inputs. Importantly, if stochastic series are provided, each series must have the same number of samples, otherwise an error will be thrown.

Each series input (i.e. target_series, past_covariates, and future_covariates) can be specified either as a single TimeSeries, or as a Sequence of TimeSeries; the specified series must all be of the same type, however (i.e. either all TimeSeries or all Sequence[TimeSeries]). If Sequence[TimeSeries] are specified, then a feature matrix X and labels array y will be constructed using the corresponding TimeSeries in each Sequence (i.e. the first TimeSeries in each Sequence are used to create an X and y, then the second TimeSeries in each Sequence are used to create an X and y, etc.). If concatenate = True, these X’s and y’s will be concatenated along the 0`th axis; otherwise, a list of `X and y array will be returned. Note that times is always returned as a Sequence[pd.Index], however, even when concatenate = True.

Parameters
  • target_series (Union[TimeSeries, Sequence[TimeSeries], None]) – Optionally, the series for the regression model to predict. Must be specified if is_training = True. Can be specified as either a TimeSeries or as a Sequence[TimeSeries].

  • past_covariates (Union[TimeSeries, Sequence[TimeSeries], None]) – Optionally, the past covariates series that the regression model will use as inputs. Unlike the target_series, past_covariates are not to be predicted by the regression model. Can be specified as either a TimeSeries or as a Sequence[TimeSeries].

  • future_covariates (Union[TimeSeries, Sequence[TimeSeries], None]) – Optionally, the future covariates (i.e. exogenous covariates) series that the regression model will use as inputs. Can be specified as either a TimeSeries or as a Sequence[TimeSeries].

  • lags (Union[Sequence[int], dict[str, list[int]], None]) – Optionally, the lags of the target series to be used as (autoregressive) features. If not specified, autoregressive features will not be added to X. Each lag value is assumed to be negative (e.g. lags = [-3, -1] will extract target_series values which are 3 time steps and 1 time step away from the current value). If the lags are provided as a dictionary, the lags values are specific to each component in the target series.

  • lags_past_covariates (Union[Sequence[int], dict[str, list[int]], None]) – Optionally, the lags of past_covariates to be used as features. Like lags, each lag value is assumed to be less than or equal to -1. If the lags are provided as a dictionary, the lags values are specific to each component in the past covariates series.

  • lags_future_covariates (Union[Sequence[int], dict[str, list[int]], None]) – Optionally, the lags of future_covariates to be used as features. Unlike lags and lags_past_covariates, lags_future_covariates values can be positive (i.e. use values after time t to predict target at time t), zero (i.e. use values at time t to predict target at time t), and/or negative (i.e. use values before time t to predict target at time t). If output_chunk_shift > 0, the lags are relative to the first time step of the shifted output chunk. If the lags are provided as a dictionary, the lags values are specific to each component in the future covariates series.

  • output_chunk_length (int) – Optionally, the number of time steps ahead into the future the regression model is to predict. Must best specified if is_training = True.

  • output_chunk_shift (int) – Optionally, the number of time steps to shift the output chunk ahead into the future.

  • uses_static_covariates (bool) – Whether the model uses/expects static covariates. If True, it enforces that static covariates must have identical shapes across all target series.

  • last_static_covariates_shape (Optional[tuple[int, int], None]) – Optionally, the last observed shape of the static covariates. This is None before fitting, or when uses_static_covariates is False.

  • max_samples_per_ts (Optional[int, None]) – Optionally, the maximum number of samples to be drawn for training/validation; only the most recent samples are kept. In theory, specifying a smaller max_samples_per_ts should reduce computation time, especially in cases where many observations could be generated.

  • multi_models (bool) – Optionally, specifies whether the regression model predicts multiple time steps into the future. If True, then the regression model is assumed to predict all time steps from time t to t+output_chunk_length. If False, then the regression model is assumed to predict only the time step at t+output_chunk_length. This input is ignored if is_training = False.

  • check_inputs (bool) – Optionally, specifies that the lags_* and series_* inputs should be checked for validity. Should be set to False if inputs have already been checked for validity (e.g. inside the __init__ of a class), otherwise should be set to True.

  • use_moving_windows (bool) – Optionally, specifies that the ‘moving window’ method should be used to construct X and y if all provided series are of the same frequency. If use_moving_windows = False, the ‘time intersection’ method will always be used, even when all provided series are of the same frequency. In general, setting to True results in faster tabularization at the potential cost of higher memory usage. See Notes for further details.

  • is_training (bool) – Optionally, specifies whether the constructed lagged data are to be used for training a regression model (i.e. is_training = True), or for generating predictions from an already-trained regression model (i.e. is_training = False). If is_training = True, target_series and output_chunk_length must be specified, the multi_models input is utilised, and a label array y is returned. Conversely, if is_training = False, then target_series and output_chunk_length do not need to be specified, the multi_models input is ignored, and the returned y value is None.

  • concatenate (bool) – Optionally, specifies that X and y should both be returned as single np.ndarray`s, instead of as a `Sequence[np.ndarray]. If each series input is specified as a Sequence[TimeSeries] and concatenate = False, X and y will be lists whose i`th element corresponds to the feature matrix or label array formed by the `i`th `TimeSeries in each Sequence[TimeSeries] input. Conversely, if concatenate = True when Sequence[TimeSeries] are provided, then X and y will be arrays created by concatenating all feature/label arrays formed by each TimeSeries along the 0`th axis. Note that `times is still returned as Sequence[pd.Index], even when concatenate = True.

  • sample_weight (Union[TimeSeries, Sequence[TimeSeries], str, None]) – Optionally, some sample weights to apply to the target series labels. They are applied per observation, per label (each step in output_chunk_length), and per component. If a series or sequence of series, then those weights are used. If the weight series only have a single component / column, then the weights are applied globally to all components in series. Otherwise, for component-specific weights, the number of components must match those of series. If a string, then the weights are generated using built-in weighting functions. The available options are “linear” or “exponential” decay - the further in the past, the lower the weight. The weights are computed globally based on the length of the longest series in series. Then for each series, the weights are extracted from the end of the global weights. This gives a common time weighting across all series.

  • show_warnings (bool) – Whether to show warnings.

Return type

tuple[Union[ndarray, Sequence[ndarray]], Union[None, ndarray, Sequence[ndarray]], Sequence[Index], Optional[tuple[int, int], None], Union[ndarray, Sequence[ndarray], None]]

Returns

  • X – The constructed features array(s), with shape (n_observations, n_lagged_features, n_samples). If the series inputs were specified as Sequence[TimeSeries] and concatenate = False, then X is returned as a Sequence[np.array]; otherwise, X is returned as a single np.array.

  • y – The constructed labels array. If multi_models = True, then y is a (n_observations, output_chunk_length, n_samples)-shaped array; conversely, if multi_models = False, then y is a (n_observations, 1, n_samples)-shaped array. If the series inputs were specified as Sequence[TimeSeries] and concatenate = False, then y is returned as a Sequence[np.array]; otherwise, y is returned as a single np.array.

  • times – The time_index of each observation in X and y, returned as a Sequence of pd.Index`es. If the series inputs were specified as `Sequence[TimeSeries], then the i`th list element gives the times of those observations formed using the `i`th `TimeSeries object in each Sequence. Otherwise, if the series inputs were specified as TimeSeries, the only element is the times of those observations formed from the lone TimeSeries inputs.

  • last_static_covariates_shape – The last observed shape of the static covariates. This is None when uses_static_covariates is False.

  • sample_weight – The weights to apply to each observation in X and output step y, returned as a Sequence of np.ndarray.

Raises
  • ValueError – If the specified time series do not share any times for which features (and labels if is_training = True) can be constructed.

  • ValueError – If no lags are specified, or if any of the specified lag values are non-negative.

  • ValueError – If any of the series are too short to create features and/or labels for the requested lags and output_chunk_length values.

  • ValueError – If target_series and/or output_chunk_length are not specified when is_training = True.

  • ValueError – If the provided series do not share the same type of time_index (e.g. target_series uses a pd.RangeIndex, but future_covariates uses a pd.DatetimeIndex).

References

1

https://otexts.com/fpp2/AR.html#AR

2

https://unit8.com/resources/time-series-forecasting-using-past-and-future-external-data-with-darts/

See also

tabularization.create_lagged_component_names

return the lagged features names as a list of strings.

darts.utils.data.tabularization.create_lagged_prediction_data(target_series=None, past_covariates=None, future_covariates=None, lags=None, lags_past_covariates=None, lags_future_covariates=None, uses_static_covariates=True, last_static_covariates_shape=None, max_samples_per_ts=None, check_inputs=True, use_moving_windows=True, concatenate=True, show_warnings=True)[source]

Creates the features array X to produce a series of prediction from an already-trained regression model; the time index values of each observation is also returned.

Notes

This function is simply a wrapper around create_lagged_data; for further details on the structure of X, please refer to help(create_lagged_data).

Parameters
  • target_series (Union[TimeSeries, Sequence[TimeSeries], None]) – Optionally, the series for the regression model to predict.

  • past_covariates (Union[TimeSeries, Sequence[TimeSeries], None]) – Optionally, the past covariates series that the regression model will use as inputs. Unlike the target_series, past_covariates are not to be predicted by the regression model.

  • future_covariates (Union[TimeSeries, Sequence[TimeSeries], None]) – Optionally, the future covariates (i.e. exogenous covariates) series that the regression model will use as inputs.

  • lags (Union[Sequence[int], dict[str, list[int]], None]) – Optionally, the lags of the target series to be used as (autoregressive) features. If not specified, autoregressive features will not be added to X. Each lag value is assumed to be negative (e.g. lags = [-3, -1] will extract target_series values which are 3 time steps and 1 time step away from the current value). If the lags are provided as a dictionary, the lags values are specific to each component in the target series.

  • lags_past_covariates (Union[Sequence[int], dict[str, list[int]], None]) – Optionally, the lags of past_covariates to be used as features. Like lags, each lag value is assumed to be less than or equal to -1. If the lags are provided as a dictionary, the lags values are specific to each component in the past covariates series.

  • lags_future_covariates (Union[Sequence[int], dict[str, list[int]], None]) – Optionally, the lags of future_covariates to be used as features. Unlike lags and lags_past_covariates, lags_future_covariates values can be positive (i.e. use values after time t to predict target at time t), zero (i.e. use values at time t to predict target at time t), and/or negative (i.e. use values before time t to predict target at time t). If the lags are provided as a dictionary, the lags values are specific to each component in the future covariates series.

  • uses_static_covariates (bool) – Whether the model uses/expects static covariates. If True, it enforces that static covariates must have identical shapes across all target series.

  • last_static_covariates_shape (Optional[tuple[int, int], None]) – Optionally, the last observed shape of the static covariates. This is None before fitting, or when uses_static_covariates is False.

  • max_samples_per_ts (Optional[int, None]) – Optionally, the maximum number of samples to be drawn for training/validation; only the most recent samples are kept. In theory, specifying a smaller max_samples_per_ts should reduce computation time, especially in cases where many observations could be generated.

  • check_inputs (bool) – Optionally, specifies that the lags_* and series_* inputs should be checked for validity. Should be set to False if inputs have already been checked for validity (e.g. inside the __init__ of a class), otherwise should be set to True.

  • use_moving_windows (bool) – Optionally, specifies that the ‘moving window’ method should be used to construct X and y if all provided series are of the same frequency. If use_moving_windows = False, the ‘time intersection’ method will always be used, even when all provided series are of the same frequency. In general, setting to True results in faster tabularization at the potential cost of higher memory usage. See Notes for further details.

  • concatenate (bool) – Optionally, specifies that X should be returned as a single np.ndarray, instead of as a Sequence[np.ndarray]. If each series input is specified as a Sequence[TimeSeries] and concatenate = False, X will be a list whose i`th element corresponds to the feature matrix or label array formed by the `i`th `TimeSeries in each Sequence[TimeSeries] input. Conversely, if concatenate = True when Sequence[TimeSeries] are provided, then X will be an array created by concatenating all feature arrays formed by each TimeSeries along the 0`th axis. Note that `times is still returned as Sequence[pd.Index], even when concatenate = True.

  • show_warnings (bool) – Whether to show warnings.

Return type

tuple[Union[ndarray, Sequence[ndarray]], Sequence[Index]]

Returns

  • X – The constructed features array(s), with shape (n_observations, n_lagged_features, n_samples). If the series inputs were specified as Sequence[TimeSeries] and concatenate = False, then X is returned as a Sequence[np.array]; otherwise, X is returned as a single np.array.

  • times – The time_index of each observation in X and y, returned as a Sequence of pd.Index`es. If the series inputs were specified as `Sequence[TimeSeries], then the i`th list element gives the times of those observations formed using the `i`th `TimeSeries object in each Sequence. Otherwise, if the series inputs were specified as TimeSeries, the only element is the times of those observations formed from the lone TimeSeries inputs.

Raises
  • ValueError – If the specified time series do not share any times for which features can be constructed.

  • ValueError – If no lags are specified, or if any of the specified lag values are non-negative.

  • ValueError – If any of the series are too short to create features for the requested lag values.

  • ValueError – If the provided series do not share the same type of time_index (e.g. target_series uses a pd.RangeIndex, but future_covariates uses a pd.DatetimeIndex).

darts.utils.data.tabularization.create_lagged_training_data(target_series, output_chunk_length, output_chunk_shift, past_covariates=None, future_covariates=None, lags=None, lags_past_covariates=None, lags_future_covariates=None, uses_static_covariates=True, last_static_covariates_shape=None, max_samples_per_ts=None, multi_models=True, check_inputs=True, use_moving_windows=True, concatenate=True, sample_weight=None)[source]

Creates the features array X and labels array y to train a lagged-variables regression model (e.g. an sklearn model); the time index values of each observation is also returned.

Notes

This function is simply a wrapper around create_lagged_data; for further details on the structure of X, please refer to help(create_lagged_data).

Parameters
  • target_series (Union[TimeSeries, Sequence[TimeSeries]]) – The series for the regression model to predict.

  • output_chunk_length (int) – The number of time steps ahead into the future the regression model is to predict.

  • output_chunk_shift (int) – Optionally, the number of time steps to shift the output chunk ahead into the future.

  • past_covariates (Union[TimeSeries, Sequence[TimeSeries], None]) – Optionally, the past covariates series that the regression model will use as inputs. Unlike the target_series, past_covariates are not to be predicted by the regression model.

  • future_covariates (Union[TimeSeries, Sequence[TimeSeries], None]) – Optionally, the future covariates (i.e. exogenous covariates) series that the regression model will use as inputs.

  • lags (Union[Sequence[int], dict[str, list[int]], None]) – Optionally, the lags of the target series to be used as (autoregressive) features. If not specified, autoregressive features will not be added to X. Each lag value is assumed to be negative (e.g. lags = [-3, -1] will extract target_series values which are 3 time steps and 1 time step away from the current value). If the lags are provided as a dictionary, the lags values are specific to each component in the target series.

  • lags_past_covariates (Union[Sequence[int], dict[str, list[int]], None]) – Optionally, the lags of past_covariates to be used as features. Like lags, each lag value is assumed to be less than or equal to -1. If the lags are provided as a dictionary, the lags values are specific to each component in the past covariates series.

  • lags_future_covariates (Union[Sequence[int], dict[str, list[int]], None]) – Optionally, the lags of future_covariates to be used as features. Unlike lags and lags_past_covariates, lags_future_covariates values can be positive (i.e. use values after time t to predict target at time t), zero (i.e. use values at time t to predict target at time t), and/or negative (i.e. use values before time t to predict target at time t). If the lags are provided as a dictionary, the lags values are specific to each component in the future covariates series.

  • uses_static_covariates (bool) – Whether the model uses/expects static covariates. If True, it enforces that static covariates must have identical shapes across all target series.

  • last_static_covariates_shape (Optional[tuple[int, int], None]) – Optionally, the last observed shape of the static covariates. This is None before fitting, or when uses_static_covariates is False.

  • max_samples_per_ts (Optional[int, None]) – Optionally, the maximum number of samples to be drawn for training/validation; only the most recent samples are kept. In theory, specifying a smaller max_samples_per_ts should reduce computation time, especially in cases where many observations could be generated.

  • multi_models (bool) – Optionally, specifies whether the regression model predicts multiple time steps into the future. If True, then the regression model is assumed to predict all time steps from time t to t+output_chunk_length. If False, then the regression model is assumed to predict only the time step at t+output_chunk_length.

  • check_inputs (bool) – Optionally, specifies that the lags_* and series_* inputs should be checked for validity. Should be set to False if inputs have already been checked for validity (e.g. inside the __init__ of a class), otherwise should be set to True.

  • use_moving_windows (bool) – Optionally, specifies that the ‘moving window’ method should be used to construct X and y if all provided series are of the same frequency. If use_moving_windows = False, the ‘time intersection’ method will always be used, even when all provided series are of the same frequency. In general, setting to True results in faster tabularization at the potential cost of higher memory usage. See Notes for further details.

  • concatenate (bool) – Optionally, specifies that X and y should both be returned as single np.ndarray`s, instead of as a `Sequence[np.ndarray]. If each series input is specified as a Sequence[TimeSeries] and concatenate = False, X and y will be lists whose i`th element corresponds to the feature matrix or label array formed by the `i`th `TimeSeries in each Sequence[TimeSeries] input. Conversely, if concatenate = True when Sequence[TimeSeries] are provided, then X and y will be arrays created by concatenating all feature/label arrays formed by each TimeSeries along the 0`th axis. Note that `times is still returned as Sequence[pd.Index], even when concatenate = True.

  • sample_weight (Union[TimeSeries, str, None]) – Optionally, some sample weights to apply to the target series labels. They are applied per observation, per label (each step in output_chunk_length), and per component. If a series or sequence of series, then those weights are used. If the weight series only have a single component / column, then the weights are applied globally to all components in series. Otherwise, for component-specific weights, the number of components must match those of series. If a string, then the weights are generated using built-in weighting functions. The available options are “linear” or “exponential” decay - the further in the past, the lower the weight. The weights are computed globally based on the length of the longest series in series. Then for each series, the weights are extracted from the end of the global weights. This gives a common time weighting across all series.

Return type

tuple[Union[ndarray, Sequence[ndarray]], Union[None, ndarray, Sequence[ndarray]], Sequence[Index], Optional[tuple[int, int], None], Union[ndarray, Sequence[ndarray], None]]

Returns

  • X – The constructed features array(s), with shape (n_observations, n_lagged_features, n_samples). If the series inputs were specified as Sequence[TimeSeries] and concatenate = False, then X is returned as a Sequence[np.array]; otherwise, X is returned as a single np.array.

  • y – The constructed labels array. If multi_models = True, then y is a (n_observations, output_chunk_length, n_samples)-shaped array; conversely, if multi_models = False, then y is a (n_observations, 1, n_samples)-shaped array. If the series inputs were specified as Sequence[TimeSeries] and concatenate = False, then y is returned as a Sequence[np.array]; otherwise, y is returned as a single np.array.

  • times – The time_index of each observation in X and y, returned as a Sequence of pd.Index`es. If the series inputs were specified as `Sequence[TimeSeries], then the i`th list element gives the times of those observations formed using the `i`th `TimeSeries object in each Sequence. Otherwise, if the series inputs were specified as TimeSeries, the only element is the times of those observations formed from the lone TimeSeries inputs.

  • sample_weight – The weights to apply to each observation in X and output step y, returned as a Sequence of np.ndarray.

Raises
  • ValueError – If the specified time series do not share any times for which features and labels can be constructed.

  • ValueError – If no lags are specified, or if any of the specified lag values are non-negative.

  • ValueError – If any of the series are too short to create features and labels for the requested lags and output_chunk_length values.

  • ValueError – If the provided series do not share the same type of time_index (e.g. target_series uses a pd.RangeIndex, but future_covariates uses a pd.DatetimeIndex).

darts.utils.data.tabularization.get_shared_times(*series_or_times, sort=True)[source]

Returns the times shared by all specified TimeSeries or time indexes (i.e. the intersection of all these times). If sort = True, then these shared times are sorted from earliest to latest. Any TimeSeries or time indices in series_or_times that aren’t specified (i.e. are None) are simply ignored.

Parameters
  • series_or_times (Union[TimeSeries, Index, None]) – The TimeSeries and/or time indices that should ‘intersected’.

  • sort (bool) – Optionally, specifies that the returned shared times should be sorted from earliest to latest.

Returns

The time indices present in all specified TimeSeries and/or time indices.

Return type

shared_times

Raises

TypeError – If the specified TimeSeries and/or time indices do not all share the same type of time index (i.e. must either be all pd.DatetimeIndex or all pd.RangeIndex).

darts.utils.data.tabularization.get_shared_times_bounds(*series_or_times)[source]

Returns the latest start_time and the earliest end_time among all non-None series_or_times; these are (non-tight) lower and upper bounds on the intersection of all these series_or_times respectively. If no potential overlap exists between all specified series, None is returned instead.

Notes

If all specified series_or_times are of the same frequency, then get_shared_times_bounds returns tight bounds (i.e. the earliest and latest time within the intersection of all the timeseries is returned). To see this, suppose we have three equal-frequency series with observations made at different times:

Series 1: —— Series 2: —— Series 3: ——

Here, each - denotes an observation at a specific time. In this example, find_time_overlap_bounds will return the times at LB and UB:

LB

Series 1: —|---| Series 2: |---|— Series 3: –|---|-

UB

If the specified timeseries are not of the same frequency, then the returned bounds is potentially non-tight (i.e. LB <= intersection.start_time() < intersection.end_time() <= UB, where intersection are the times shared by all specified timeseries)

Parameters

series_or_times (Sequence[Union[TimeSeries, Index, None]]) – The TimeSeries and/or pd.Index values to compute intersection bounds for; any provided None values are ignored.

Returns

Tuple containing the latest start_time and earliest end time among all specified timeseries, in that order. If no potential overlap exists between the specified series, then None is returned instead. Similarly, if no non-None series_or_times were specified, None is returned.

Return type

bounds

Raises

TypeError – If the series and/or times in series_or_times don’t all share the same type of time_index (i.e. either all pd.DatetimeIndex or pd.RangeIndex).

darts.utils.data.tabularization.strided_moving_window(x, window_len, stride=1, axis=0, check_inputs=True)[source]

Extracts moving window views of an x array along a specified axis, where each window is of length window_len and consecutive windows are separated by stride indices. The total number of extracted windows equals num_windows = (x.shape[axis] - window_len)//stride + 1.

Notes

This function is similar to sliding_window_view in np.lib.stride_tricks, except that:

1. strided_moving_window allows for consecutive windows to be separated by a specified stride, whilst sliding_window_view does not. 2. strided_moving_window can only operate along a single axis, whereas sliding_window_view can operate along multiple axes.

Additionally, unlike sliding_window_view, using strided_moving_window doesn’t require numpy >= 1.20.0.

Parameters
  • x (ndarray) – The array from which to extract moving windows.

  • window_len (int) – The size of the extracted moving windows.

  • stride (int) – Optionally, the separation between consecutive windows.

  • axis (int) – Optionally, the axis along which the moving windows should be extracted.

  • check_inputs (bool) – Optionally, specifies whether inputs should be checked for validity. Should be set to False if inputs have already been checked for validity (e.g. inside the __init__ of a class), otherwise should be set to True. See [1] for further details.

Returns

The moving windows extracted from x. The extracted windows are stacked along the last axis, and the axis along which the windows were extracted is ‘trimmed’ such that its length equals the number of extracted windows. More specifically, windows.shape = x_trimmed_shape + (window_len,), where x_trimmed_shape equals x.shape, except that x_trimmed_shape[axis] = num_windows.

Return type

windows

Raises
  • ValueError – If check_inputs = True and window_len is not positive.

  • ValueError – If check_inputs = True and stride is not positive.

  • ValueError – If check_inputs = True and axis is greater than x.ndim.

  • ValueError – If check_inputs = True and window_len is larger than x.shape[axis].

References

1

https://numpy.org/doc/stable/reference/generated/numpy.lib.stride_tricks.as_strided.html