Horizon-Based Training Dataset

class darts.utils.data.horizon_based_dataset.HorizonBasedDataset(target_series, covariates=None, output_chunk_length=12, lh=(1, 3), lookback=3, use_static_covariates=True, sample_weight=None)[source]

Bases: PastCovariatesTrainingDataset

A time series dataset containing tuples of (past_target, past_covariates, static_covariates, sample weights, future_target) arrays, in a way inspired by the N-BEATS way of training on the M4 dataset: https://arxiv.org/abs/1905.10437.

The “past” series have length lookback * output_chunk_length, and the “future” series has length output_chunk_length.

Given the horizon output_chunk_length of a model, this dataset will compute some “past/future” splits as follows: First a “forecast point” is selected in the the range of the last (min_lh * output_chunk_length, max_lh * output_chunk_length) points before the end of the time series. The “future” then consists in the following output_chunk_length points, and the “past” will be the preceding lookback * output_chunk_length points.

All the series in the provided sequence must be long enough; i.e. have length at least (lookback + max_lh) * output_chunk_length, and min_lh must be at least 1 (to have targets of length exactly 1 * output_chunk_length). The target and covariates time series are sliced together using their time indexes for alignment.

The sampling is uniform both over the number of time series and the number of samples per series; i.e. the i-th sample of this dataset has 1/(N*M) chance of coming from any of the M samples in any of the N time series in the sequence.

Parameters
  • target_series (Union[TimeSeries, Sequence[TimeSeries]]) – One or a sequence of target TimeSeries.

  • covariates (Union[TimeSeries, Sequence[TimeSeries], None]) – Optionally, one or a sequence of TimeSeries containing past-observed covariates. If this parameter is set, the provided sequence must have the same length as that of target_series. Moreover, all covariates in the sequence must have a time span large enough to contain all the required slices. The joint slicing of the target and covariates is relying on the time axes of both series.

  • output_chunk_length (int) – The length of the “output” series emitted by the model

  • lh (tuple[int, int]) – A (min_lh, max_lh) interval for the forecast point, starting from the end of the series. For example, (1, 3) will select forecast points uniformly between 1*H and 3*H points before the end of the series. It is required that min_lh >= 1.

  • lookback (int) – A integer interval for the length of the input in the emitted input and output splits, expressed as a multiple of output_chunk_length. For instance, lookback=3 will emit “inputs” of lengths 3 * output_chunk_length.

  • use_static_covariates (bool) – Whether to use/include static covariate data from input series.

  • sample_weight (Union[TimeSeries, Sequence[TimeSeries], str, None]) – Optionally, some sample weights to apply to the target series labels. They are applied per observation, per label (each step in output_chunk_length), and per component. If a series or sequence of series, then those weights are used. If the weight series only have a single component / column, then the weights are applied globally to all components in series. Otherwise, for component-specific weights, the number of components must match those of series. If a string, then the weights are generated using built-in weighting functions. The available options are “linear” or “exponential” decay - the further in the past, the lower the weight. The weights are computed globally based on the length of the longest series in series. Then for each series, the weights are extracted from the end of the global weights. This gives a common time weighting across all series.