Shifted Training Dataset¶
- class darts.utils.data.shifted_dataset.DualCovariatesShiftedDataset(target_series, covariates=None, length=12, shift=1, max_samples_per_ts=None, use_static_covariates=True, sample_weight=None)[source]¶
Bases:
DualCovariatesTrainingDataset
A time series dataset containing tuples of (past_target, historic_future_covariates, future_covariates, static_covariates, sample weights, future_target) arrays, which all have length length. The “future_target” is the “past_target” target shifted by shift time steps forward. So if an emitted “past_target” goes from position i to i+length, the emitted “future_target” will go from position i+shift to i+shift+length. The slicing “future_covariates” matches that of “futuretarget” and the slicing of “historic_future_covariates” matches that of “past_target”. The slicing itself relies on time indexes to align the series if they have unequal lengths.
Each series must be long enough to contain at least one (input, output) pair; i.e., each series must have length at least length + shift. If these conditions are not satisfied, an error will be raised when trying to access some of the splits.
The sampling is uniform over the number of time series; i.e., the i-th sample of this dataset has a probability 1/N of coming from any of the N time series in the sequence. If the time series have different lengths, they will contain different numbers of slices. Therefore, some particular slices may be sampled more often than others if they belong to shorter time series.
- Parameters
target_series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – One or a sequence of target TimeSeries.covariates (
Union
[TimeSeries
,Sequence
[TimeSeries
],None
]) – Optionally, one or a sequence of TimeSeries containing future-known covariates. If this parameter is set, the provided sequence must have the same length as that of target_series. Moreover, all covariates in the sequence must have a time span large enough to contain all the required slices. The joint slicing of the target and covariates is relying on the time axes of both series.length (
int
) – The length of the emitted past and future series.shift (
int
) – The number of time steps by which to shift the output chunks relative to the start of the input chunks.max_samples_per_ts (
Optional
[int
]) – This is an upper bound on the number of tuples that can be produced per time series. It can be used in order to have an upper bound on the total size of the dataset and ensure proper sampling. If None, it will read all of the individual time series in advance (at dataset creation) to know their sizes, which might be expensive on big datasets. If some series turn out to have a length that would allow more than max_samples_per_ts, only the most recent max_samples_per_ts samples will be considered.use_static_covariates (
bool
) – Whether to use/include static covariate data from input series.sample_weight (
Union
[TimeSeries
,Sequence
[TimeSeries
],str
,None
]) – Optionally, some sample weights to apply to the target series labels. They are applied per observation, per label (each step in output_chunk_length), and per component. If a series or sequence of series, then those weights are used. If the weight series only have a single component / column, then the weights are applied globally to all components in series. Otherwise, for component-specific weights, the number of components must match those of series. If a string, then the weights are generated using built-in weighting functions. The available options are “linear” or “exponential” decay - the further in the past, the lower the weight. The weights are computed globally based on the length of the longest series in series. Then for each series, the weights are extracted from the end of the global weights. This gives a common time weighting across all series.
- class darts.utils.data.shifted_dataset.FutureCovariatesShiftedDataset(target_series, covariates=None, length=12, shift=1, max_samples_per_ts=None, use_static_covariates=True, sample_weight=None)[source]¶
Bases:
FutureCovariatesTrainingDataset
A time series dataset containing tuples of (past_target, future_covariates, static_covariates, sample weights, future_target) arrays, which all have length length. The “future_target” is the “past_target” target shifted by shift time steps forward. So if an emitted “past_target” goes from position i to i+length, the emitted “future_target” will go from position i+shift to i+shift+length. The slicing future covariates matches that of future targets. The slicing itself relies on time indexes to align the series if they have unequal lengths.
Each series must be long enough to contain at least one (input, output) pair; i.e., each series must have length at least length + shift. If these conditions are not satisfied, an error will be raised when trying to access some of the splits.
The sampling is uniform over the number of time series; i.e., the i-th sample of this dataset has a probability 1/N of coming from any of the N time series in the sequence. If the time series have different lengths, they will contain different numbers of slices. Therefore, some particular slices may be sampled more often than others if they belong to shorter time series.
- Parameters
target_series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – One or a sequence of target TimeSeries.covariates (
Union
[TimeSeries
,Sequence
[TimeSeries
],None
]) – Optionally, one or a sequence of TimeSeries containing future-known covariates. If this parameter is set, the provided sequence must have the same length as that of target_series. Moreover, all covariates in the sequence must have a time span large enough to contain all the required slices. The joint slicing of the target and covariates is relying on the time axes of both series.length (
int
) – The length of the emitted past and future series.shift (
int
) – The number of time steps by which to shift the output chunks relative to the start of the input chunks.max_samples_per_ts (
Optional
[int
]) – This is an upper bound on the number of tuples that can be produced per time series. It can be used in order to have an upper bound on the total size of the dataset and ensure proper sampling. If None, it will read all of the individual time series in advance (at dataset creation) to know their sizes, which might be expensive on big datasets. If some series turn out to have a length that would allow more than max_samples_per_ts, only the most recent max_samples_per_ts samples will be considered.use_static_covariates (
bool
) – Whether to use/include static covariate data from input series.sample_weight (
Union
[TimeSeries
,Sequence
[TimeSeries
],str
,None
]) – Optionally, some sample weights to apply to the target series labels. They are applied per observation, per label (each step in output_chunk_length), and per component. If a series or sequence of series, then those weights are used. If the weight series only have a single component / column, then the weights are applied globally to all components in series. Otherwise, for component-specific weights, the number of components must match those of series. If a string, then the weights are generated using built-in weighting functions. The available options are “linear” or “exponential” decay - the further in the past, the lower the weight. The weights are computed globally based on the length of the longest series in series. Then for each series, the weights are extracted from the end of the global weights. This gives a common time weighting across all series.
- class darts.utils.data.shifted_dataset.GenericShiftedDataset(target_series, covariates=None, input_chunk_length=12, output_chunk_length=1, shift=1, shift_covariates=False, max_samples_per_ts=None, covariate_type=CovariateType.NONE, use_static_covariates=True, sample_weight=None)[source]¶
Bases:
TrainingDataset
Contains (past_target, <X>_covariates, static_covariates, sample weights, future_target), where “<X>” is past if shift_covariates = False and future otherwise. The past chunks have length input_chunk_length and the future chunks have length output_chunk_length. The future chunks start shift after the past chunks’ start.
This is meant to be a “generic” dataset that can be used to build ShiftedDataset’s (when input_chunk_length = output_chunk_length), or SequenceDataset’s (when shift = input_chunk_length).
- Parameters
target_series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – One or a sequence of target TimeSeries.covariates (
Union
[TimeSeries
,Sequence
[TimeSeries
],None
]) – Optionally, one or a sequence of TimeSeries containing covariates.input_chunk_length (
int
) – The length of the emitted past series.output_chunk_length (
int
) – The length of the emitted future series.shift (
int
) – The number of time steps by which to shift the output chunks relative to the start of the input chunks.shift_covariates (
bool
) – Whether to shift the covariates forward the same way as the target. FutureCovariatesModel’s require this set to True, while PastCovariatesModel’s require this set to False.max_samples_per_ts (
Optional
[int
]) – This is an upper bound on the number of (input, output, input_covariates) tuples that can be produced per time series. It can be used in order to have an upper bound on the total size of the dataset and ensure proper sampling. If None, it will read all of the individual time series in advance (at dataset creation) to know their sizes, which might be expensive on big datasets. If some series turn out to have a length that would allow more than max_samples_per_ts, only the most recent max_samples_per_ts samples will be considered.covariate_type (
CovariateType
) – An instance of CovariateType describing the type of covariates.use_static_covariates (
bool
) – Whether to use/include static covariate data from input series.sample_weight (
Union
[TimeSeries
,Sequence
[TimeSeries
],str
,None
]) – Optionally, some sample weights to apply to the target series labels. They are applied per observation, per label (each step in output_chunk_length), and per component. If a series or sequence of series, then those weights are used. If the weight series only have a single component / column, then the weights are applied globally to all components in series. Otherwise, for component-specific weights, the number of components must match those of series. If a string, then the weights are generated using built-in weighting functions. The available options are “linear” or “exponential” decay - the further in the past, the lower the weight. The weights are computed globally based on the length of the longest series in series. Then for each series, the weights are extracted from the end of the global weights. This gives a common time weighting across all series.
- class darts.utils.data.shifted_dataset.MixedCovariatesShiftedDataset(target_series, past_covariates=None, future_covariates=None, length=12, shift=1, max_samples_per_ts=None, use_static_covariates=True, sample_weight=None)[source]¶
Bases:
MixedCovariatesTrainingDataset
A time series dataset containing tuples of (past_target, past_covariates, historic_future_covariates, future_covariates, static_covariates, sample weights, future_target) arrays, which all have length length. The “future_target” is the “past_target” target shifted by shift time steps forward. So if an emitted “past_target” goes from position i to i+length, the emitted “future_target” will go from position i+shift to i+shift+length. The slicing of past and future covariates matches that of past and future targets, respectively. The slicing itself relies on time indexes to align the series if they have unequal lengths.
Each series must be long enough to contain at least one (input, output) pair; i.e., each series must have length at least length + shift. If these conditions are not satisfied, an error will be raised when trying to access some of the splits.
The sampling is uniform over the number of time series; i.e., the i-th sample of this dataset has a probability 1/N of coming from any of the N time series in the sequence. If the time series have different lengths, they will contain different numbers of slices. Therefore, some particular slices may be sampled more often than others if they belong to shorter time series.
- Parameters
target_series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – One or a sequence of target TimeSeries.past_covariates (
Union
[TimeSeries
,Sequence
[TimeSeries
],None
]) – Optionally, one or a sequence of TimeSeries containing past-observed covariates. If this parameter is set, the provided sequence must have the same length as that of target_series. Moreover, all covariates in the sequence must have a time span large enough to contain all the required slices. The joint slicing of the target and covariates is relying on the time axes of both series.future_covariates (
Union
[TimeSeries
,Sequence
[TimeSeries
],None
]) – Optionally, one or a sequence of TimeSeries containing future-known covariates. This has to follow the same constraints as past_covariates.length (
int
) – The length of the emitted past and future series.shift (
int
) – The number of time steps by which to shift the output chunks relative to the start of the input chunks.max_samples_per_ts (
Optional
[int
]) – This is an upper bound on the number of tuples that can be produced per time series. It can be used in order to have an upper bound on the total size of the dataset and ensure proper sampling. If None, it will read all of the individual time series in advance (at dataset creation) to know their sizes, which might be expensive on big datasets. If some series turn out to have a length that would allow more than max_samples_per_ts, only the most recent max_samples_per_ts samples will be considered.use_static_covariates (
bool
) – Whether to use/include static covariate data from input series.sample_weight (
Union
[TimeSeries
,Sequence
[TimeSeries
],str
,None
]) – Optionally, some sample weights to apply to the target series labels. They are applied per observation, per label (each step in output_chunk_length), and per component. If a series or sequence of series, then those weights are used. If the weight series only have a single component / column, then the weights are applied globally to all components in series. Otherwise, for component-specific weights, the number of components must match those of series. If a string, then the weights are generated using built-in weighting functions. The available options are “linear” or “exponential” decay - the further in the past, the lower the weight. The weights are computed globally based on the length of the longest series in series. Then for each series, the weights are extracted from the end of the global weights. This gives a common time weighting across all series.
- class darts.utils.data.shifted_dataset.PastCovariatesShiftedDataset(target_series, covariates=None, length=12, shift=1, max_samples_per_ts=None, use_static_covariates=True, sample_weight=None)[source]¶
Bases:
PastCovariatesTrainingDataset
A time series dataset containing tuples of (past_target, past_covariates, static_covariates, sample weights, future_target) arrays, which all have length length. The “future_target” is the “past_target” target shifted by shift time steps forward. So if an emitted “past_target” (and “past_covariates”) goes from position i to i+length, the emitted “future_target” will go from position i+shift to i+shift+length.
Each series must be long enough to contain at least one (input, output) pair; i.e., each series must have length at least length + shift. If these conditions are not satisfied, an error will be raised when trying to access some of the splits.
The sampling is uniform over the number of time series; i.e., the i-th sample of this dataset has a probability 1/N of coming from any of the N time series in the sequence. If the time series have different lengths, they will contain different numbers of slices. Therefore, some particular slices may be sampled more often than others if they belong to shorter time series.
- Parameters
target_series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – One or a sequence of target TimeSeries.covariates (
Union
[TimeSeries
,Sequence
[TimeSeries
],None
]) – Optionally, one or a sequence of TimeSeries containing past-observed covariates. If this parameter is set, the provided sequence must have the same length as that of target_series. Moreover, all covariates in the sequence must have a time span large enough to contain all the required slices. The joint slicing of the target and covariates is relying on the time axes of both series.length (
int
) – The length of the emitted past and future series.shift (
int
) – The number of time steps by which to shift the output chunks relative to the start of the input chunks.max_samples_per_ts (
Optional
[int
]) – This is an upper bound on the number of tuples that can be produced per time series. It can be used in order to have an upper bound on the total size of the dataset and ensure proper sampling. If None, it will read all of the individual time series in advance (at dataset creation) to know their sizes, which might be expensive on big datasets. If some series turn out to have a length that would allow more than max_samples_per_ts, only the most recent max_samples_per_ts samples will be considered.use_static_covariates (
bool
) – Whether to use/include static covariate data from input series.sample_weight (
Union
[TimeSeries
,Sequence
[TimeSeries
],str
,None
]) – Optionally, some sample weights to apply to the target series labels. They are applied per observation, per label (each step in output_chunk_length), and per component. If a series or sequence of series, then those weights are used. If the weight series only have a single component / column, then the weights are applied globally to all components in series. Otherwise, for component-specific weights, the number of components must match those of series. If a string, then the weights are generated using built-in weighting functions. The available options are “linear” or “exponential” decay - the further in the past, the lower the weight. The weights are computed globally based on the length of the longest series in series. Then for each series, the weights are extracted from the end of the global weights. This gives a common time weighting across all series.
- class darts.utils.data.shifted_dataset.SplitCovariatesShiftedDataset(target_series, past_covariates=None, future_covariates=None, length=12, shift=1, max_samples_per_ts=None, use_static_covariates=True, sample_weight=None)[source]¶
Bases:
SplitCovariatesTrainingDataset
A time series dataset containing tuples of (past_target, past_covariates, future_covariates, static_covariates, sample weights, future_target) arrays, which all have length length. The “future_target” is the “past_target” target shifted by shift time steps forward. So if an emitted “past_target” goes from position i to i+length, the emitted “future_target” will go from position i+shift to i+shift+length. The slicing of past and future covariates matches that of past and future targets, respectively. The slicing itself relies on time indexes to align the series if they have unequal lengths.
Each series must be long enough to contain at least one (input, output) pair; i.e., each series must have length at least length + shift. If these conditions are not satisfied, an error will be raised when trying to access some of the splits.
The sampling is uniform over the number of time series; i.e., the i-th sample of this dataset has a probability 1/N of coming from any of the N time series in the sequence. If the time series have different lengths, they will contain different numbers of slices. Therefore, some particular slices may be sampled more often than others if they belong to shorter time series.
- Parameters
target_series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – One or a sequence of target TimeSeries.past_covariates (
Union
[TimeSeries
,Sequence
[TimeSeries
],None
]) – Optionally, one or a sequence of TimeSeries containing past-observed covariates. If this parameter is set, the provided sequence must have the same length as that of target_series. Moreover, all covariates in the sequence must have a time span large enough to contain all the required slices. The joint slicing of the target and covariates is relying on the time axes of both series.future_covariates (
Union
[TimeSeries
,Sequence
[TimeSeries
],None
]) – Optionally, one or a sequence of TimeSeries containing future-known covariates. This has to follow the same constraints as past_covariates.length (
int
) – The length of the emitted past and future series.shift (
int
) – The number of time steps by which to shift the output chunks relative to the start of the input chunks.max_samples_per_ts (
Optional
[int
]) – This is an upper bound on the number of tuples that can be produced per time series. It can be used in order to have an upper bound on the total size of the dataset and ensure proper sampling. If None, it will read all of the individual time series in advance (at dataset creation) to know their sizes, which might be expensive on big datasets. If some series turn out to have a length that would allow more than max_samples_per_ts, only the most recent max_samples_per_ts samples will be considered.use_static_covariates (
bool
) – Whether to use/include static covariate data from input series.sample_weight (
Union
[TimeSeries
,Sequence
[TimeSeries
],str
,None
]) – Optionally, some sample weights to apply to the target series labels. They are applied per observation, per label (each step in output_chunk_length), and per component. If a series or sequence of series, then those weights are used. If the weight series only have a single component / column, then the weights are applied globally to all components in series. Otherwise, for component-specific weights, the number of components must match those of series. If a string, then the weights are generated using built-in weighting functions. The available options are “linear” or “exponential” decay - the further in the past, the lower the weight. The weights are computed globally based on the length of the longest series in series. Then for each series, the weights are extracted from the end of the global weights. This gives a common time weighting across all series.