Utils for time series generation

darts.utils.timeseries_generation.autoregressive_timeseries(coef, start_values=None, start=Timestamp('2000-01-01 00:00:00'), end=None, length=None, freq=None, column_name='autoregressive')[source]

Creates a univariate, autoregressive TimeSeries whose values are calculated using specified coefficients coef and starting values start_values.

Parameters
  • coef (Sequence[float]) – The autoregressive coefficients used for calculating the next time step. series[t] = coef[-1] * series[t-1] + coef[-2] * series[t-2] + … + coef[0] * series[t-len(coef)]

  • start_values (Optional[Sequence[float], None]) – The starting values used for calculating the first few values for which no lags exist yet. series[0] = coef[-1] * starting_values[-1] + coef[-2] * starting_values[-2] + … + coef[0] * starting_values[0]

  • start (Union[Timestamp, int, None]) – The start of the returned TimeSeries’ index. If a pandas Timestamp is passed, the TimeSeries will have a pandas DatetimeIndex. If an integer is passed, the TimeSeries will have a pandas RangeIndex index. Works only with either length or end.

  • end (Union[Timestamp, int, None]) – Optionally, the end of the returned index. Works only with either start or length. If start is set, end must be of same type as start. Else, it can be either a pandas Timestamp or an integer.

  • length (Optional[int, None]) – Optionally, the length of the returned index. Works only with either start or end.

  • freq (Union[str, int, None]) – The time difference between two adjacent entries in the returned index. In case start is a timestamp, a DateOffset alias is expected; see docs. By default, “D” (daily) is used. If start is an integer, freq will be interpreted as the step size in the underlying RangeIndex. The freq is optional for generating an integer index (if not specified, 1 is used).

  • column_name (Optional[str, None]) – Optionally, the name of the value column for the returned TimeSeries

Returns

An autoregressive TimeSeries created as indicated above.

Return type

TimeSeries

darts.utils.timeseries_generation.constant_timeseries(value=1, start=Timestamp('2000-01-01 00:00:00'), end=None, length=None, freq=None, column_name='constant', dtype=<class 'numpy.float64'>)[source]

Creates a constant univariate TimeSeries with the given value, length (or end date), start date and frequency.

Parameters
  • value (float) – The constant value that the TimeSeries object will assume at every index.

  • start (Union[Timestamp, int, None]) – The start of the returned TimeSeries’ index. If a pandas Timestamp is passed, the TimeSeries will have a pandas DatetimeIndex. If an integer is passed, the TimeSeries will have a pandas RangeIndex index. Works only with either length or end.

  • end (Union[Timestamp, int, None]) – Optionally, the end of the returned index. Works only with either start or length. If start is set, end must be of same type as start. Else, it can be either a pandas Timestamp or an integer.

  • length (Optional[int, None]) – Optionally, the length of the returned index. Works only with either start or end.

  • freq (Union[str, int, None]) –

    The time difference between two adjacent entries in the returned index. In case start is a timestamp, a DateOffset alias is expected; see docs. By default, “D” (daily) is used. If start is an integer, freq will be interpreted as the step size in the underlying RangeIndex. The freq is optional for generating an integer index (if not specified, 1 is used).

  • column_name (Optional[str, None]) – Optionally, the name of the value column for the returned TimeSeries

  • dtype (dtype) – The desired NumPy dtype (np.float32 or np.float64) for the resulting series

Returns

A constant TimeSeries with value ‘value’.

Return type

TimeSeries

darts.utils.timeseries_generation.datetime_attribute_timeseries(time_index, attribute, one_hot=False, cyclic=False, until=None, add_length=0, dtype=<class 'numpy.float64'>, with_columns=None, tz=None)[source]

Returns a new TimeSeries with index time_index and one or more dimensions containing (optionally one-hot encoded or cyclic encoded) pd.DatatimeIndex attribute information derived from the index.

1-indexed attributes are shifted to enforce 0-indexing across all the encodings.

Parameters
  • time_index (Union[DatetimeIndex, TimeSeries]) – Either a pd.DatetimeIndex attribute which will serve as the basis of the new column(s), or a TimeSeries whose time axis will serve this purpose.

  • attribute (str) – An attribute of pd.DatetimeIndex, or week / weekofyear / week_of_year - e.g. “month”, “weekday”, “day”, “hour”, “minute”, “second”. See all available attributes in https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.

  • one_hot (bool) – Boolean value indicating whether to add the specified attribute as a one hot encoding (results in more columns).

  • cyclic (bool) – Boolean value indicating whether to add the specified attribute as a cyclic encoding. Alternative to one_hot encoding, enable only one of the two. (adds 2 columns, corresponding to sin and cos transformation)

  • until (Union[int, str, Timestamp, None]) – Extend the time_index up until timestamp for datetime indexed series and int for range indexed series, should match or exceed forecasting window.

  • add_length (int) – Extend the time_index by add_length, should match or exceed forecasting window. Set only one of until and add_length.

  • dtype – The desired NumPy dtype (np.float32 or np.float64) for the resulting series

  • with_columns (Union[str, list[str], None]) –

    Optionally, specify the output component names. * If one_hot and cyclic are False, must be a string * If cyclic is True, must be a list of two strings. The first string for the sine, the second for the

    cosine component name.

    • If one_hot is True, must be a list of strings of the same length as the generated one hot encoded

      features.

  • tz (Optional[str, None]) – Optionally, a time zone to convert the time index to before computing the attributes.

Returns

New datetime attribute TimeSeries instance.

Return type

TimeSeries

darts.utils.timeseries_generation.gaussian_timeseries(mean=0.0, std=1.0, start=Timestamp('2000-01-01 00:00:00'), end=None, length=None, freq=None, column_name='gaussian', dtype=<class 'numpy.float64'>)[source]

Creates a gaussian univariate TimeSeries by sampling all the series values independently, from a gaussian distribution with mean mean and standard deviation std.

Parameters
  • mean (Union[float, ndarray]) – The mean of the gaussian distribution that is sampled at each step. If a float value is given, the same mean is used at every step. If a numpy.ndarray of floats with the same length as length is given, a different mean is used at each time step.

  • std (Union[float, ndarray]) – The standard deviation of the gaussian distribution that is sampled at each step. If a float value is given, the same standard deviation is used at every step. If an array of dimension (length, length) is given, it will be used as covariance matrix for a multivariate gaussian distribution.

  • start (Union[Timestamp, int, None]) – The start of the returned TimeSeries’ index. If a pandas Timestamp is passed, the TimeSeries will have a pandas DatetimeIndex. If an integer is passed, the TimeSeries will have a pandas RangeIndex index. Works only with either length or end.

  • end (Union[Timestamp, int, None]) – Optionally, the end of the returned index. Works only with either start or length. If start is set, end must be of same type as start. Else, it can be either a pandas Timestamp or an integer.

  • length (Optional[int, None]) – Optionally, the length of the returned index. Works only with either start or end.

  • freq (Union[str, int, None]) –

    The time difference between two adjacent entries in the returned index. In case start is a timestamp, a DateOffset alias is expected; see docs. By default, “D” (daily) is used. If start is an integer, freq will be interpreted as the step size in the underlying RangeIndex. The freq is optional for generating an integer index (if not specified, 1 is used).

  • column_name (Optional[str, None]) – Optionally, the name of the value column for the returned TimeSeries

  • dtype (dtype) – The desired NumPy dtype (np.float32 or np.float64) for the resulting series

Returns

A white noise TimeSeries created as indicated above.

Return type

TimeSeries

darts.utils.timeseries_generation.holidays_timeseries(time_index, country_code, prov=None, state=None, column_name='holidays', until=None, add_length=0, dtype=<class 'numpy.float64'>, tz=None)[source]

Creates a binary univariate TimeSeries with index time_index that equals 1 at every index that lies within (or equals) a selected country’s holiday, and 0 otherwise.

Available countries can be found here.

Parameters
  • time_index (Union[TimeSeries, DatetimeIndex]) – Either a pd.DatetimeIndex or a TimeSeries for which to generate the holidays.

  • country_code (str) – The country ISO code.

  • prov (Optional[str, None]) – The province.

  • state (Optional[str, None]) – The state.

  • until (Union[int, str, Timestamp, None]) – Extend the time_index up until timestamp for datetime indexed series and int for range indexed series, should match or exceed forecasting window.

  • add_length (int) – Extend the time_index by add_length, should match or exceed forecasting window. Set only one of until and add_length.

  • column_name (Optional[str, None]) – Optionally, the name of the value column for the returned TimeSeries.

  • dtype (dtype) – The desired NumPy dtype (np.float32 or np.float64) for the resulting series.

  • tz (Optional[str, None]) – Optionally, a time zone to convert the time index to before generating the holidays.

Returns

A new binary holiday TimeSeries instance.

Return type

TimeSeries

darts.utils.timeseries_generation.linear_timeseries(start_value=0, end_value=1, start=Timestamp('2000-01-01 00:00:00'), end=None, length=None, freq=None, column_name='linear', dtype=<class 'numpy.float64'>)[source]

Creates a univariate TimeSeries with a starting value of start_value that increases linearly such that it takes on the value end_value at the last entry of the TimeSeries. This means that the difference between two adjacent entries will be equal to (end_value - start_value) / (length - 1).

Parameters
  • start_value (float) – The value of the first entry in the TimeSeries.

  • end_value (float) – The value of the last entry in the TimeSeries.

  • start (Union[Timestamp, int, None]) – The start of the returned TimeSeries’ index. If a pandas Timestamp is passed, the TimeSeries will have a pandas DatetimeIndex. If an integer is passed, the TimeSeries will have a pandas RangeIndex index. Works only with either length or end.

  • end (Union[Timestamp, int, None]) – Optionally, the end of the returned index. Works only with either start or length. If start is set, end must be of same type as start. Else, it can be either a pandas Timestamp or an integer.

  • length (Optional[int, None]) – Optionally, the length of the returned index. Works only with either start or end.

  • freq (Union[str, int, None]) –

    The time difference between two adjacent entries in the returned index. In case start is a timestamp, a DateOffset alias is expected; see docs. By default, “D” (daily) is used. If start is an integer, freq will be interpreted as the step size in the underlying RangeIndex. The freq is optional for generating an integer index (if not specified, 1 is used).

  • column_name (Optional[str, None]) – Optionally, the name of the value column for the returned TimeSeries

  • dtype (dtype) – The desired NumPy dtype (np.float32 or np.float64) for the resulting series

Returns

A linear TimeSeries created as indicated above.

Return type

TimeSeries

darts.utils.timeseries_generation.random_walk_timeseries(mean=0.0, std=1.0, start=Timestamp('2000-01-01 00:00:00'), end=None, length=None, freq=None, column_name='random_walk', dtype=<class 'numpy.float64'>)[source]

Creates a random walk univariate TimeSeries, where each step is obtained by sampling a gaussian distribution with mean mean and standard deviation std.

Parameters
  • mean (float) – The mean of the gaussian distribution that is sampled at each step.

  • std (float) – The standard deviation of the gaussian distribution that is sampled at each step.

  • start (Union[Timestamp, int, None]) – The start of the returned TimeSeries’ index. If a pandas Timestamp is passed, the TimeSeries will have a pandas DatetimeIndex. If an integer is passed, the TimeSeries will have a pandas RangeIndex index. Works only with either length or end.

  • end (Union[Timestamp, int, None]) – Optionally, the end of the returned index. Works only with either start or length. If start is set, end must be of same type as start. Else, it can be either a pandas Timestamp or an integer.

  • length (Optional[int, None]) – Optionally, the length of the returned index. Works only with either start or end.

  • freq (Union[str, int, None]) –

    The time difference between two adjacent entries in the returned index. In case start is a timestamp, a DateOffset alias is expected; see docs. By default, “D” (daily) is used. If start is an integer, freq will be interpreted as the step size in the underlying RangeIndex. The freq is optional for generating an integer index (if not specified, 1 is used).

  • column_name (Optional[str, None]) – Optionally, the name of the value column for the returned TimeSeries

  • dtype (dtype) – The desired NumPy dtype (np.float32 or np.float64) for the resulting series

Returns

A random walk TimeSeries created as indicated above.

Return type

TimeSeries

darts.utils.timeseries_generation.sine_timeseries(value_frequency=0.1, value_amplitude=1.0, value_phase=0.0, value_y_offset=0.0, start=Timestamp('2000-01-01 00:00:00'), end=None, length=None, freq=None, column_name='sine', dtype=<class 'numpy.float64'>)[source]

Creates a univariate TimeSeries with a sinusoidal value progression with a given frequency, amplitude, phase and y offset.

Parameters
  • value_frequency (float) – The number of periods that take place within one time unit given in freq.

  • value_amplitude (float) – The maximum difference between any value of the returned TimeSeries and y_offset.

  • value_phase (float) – The relative position within one period of the first value of the returned TimeSeries (in radians).

  • value_y_offset (float) – The shift of the sine function along the y axis.

  • start (Union[Timestamp, int, None]) – The start of the returned TimeSeries’ index. If a pandas Timestamp is passed, the TimeSeries will have a pandas DatetimeIndex. If an integer is passed, the TimeSeries will have a pandas RangeIndex index. Works only with either length or end.

  • end (Union[Timestamp, int, None]) – Optionally, the end of the returned index. Works only with either start or length. If start is set, end must be of same type as start. Else, it can be either a pandas Timestamp or an integer.

  • length (Optional[int, None]) – Optionally, the length of the returned index. Works only with either start or end.

  • freq (Union[str, int, None]) –

    The time difference between two adjacent entries in the returned index. In case start is a timestamp, a DateOffset alias is expected; see docs. By default, “D” (daily) is used. If start is an integer, freq will be interpreted as the step size in the underlying RangeIndex. The freq is optional for generating an integer index (if not specified, 1 is used).

  • column_name (Optional[str, None]) – Optionally, the name of the value column for the returned TimeSeries

  • dtype (dtype) – The desired NumPy dtype (np.float32 or np.float64) for the resulting series

Returns

A sinusoidal TimeSeries parametrized as indicated above.

Return type

TimeSeries