Utils for time series generation¶
- darts.utils.timeseries_generation.autoregressive_timeseries(coef, start_values=None, start=Timestamp('2000-01-01 00:00:00'), end=None, length=None, freq=None, column_name='autoregressive')[source]¶
Creates a univariate, autoregressive TimeSeries whose values are calculated using specified coefficients coef and starting values start_values.
- Parameters
coef (
Sequence
[float
]) – The autoregressive coefficients used for calculating the next time step. series[t] = coef[-1] * series[t-1] + coef[-2] * series[t-2] + … + coef[0] * series[t-len(coef)]start_values (
Optional
[Sequence
[float
]]) – The starting values used for calculating the first few values for which no lags exist yet. series[0] = coef[-1] * starting_values[-1] + coef[-2] * starting_values[-2] + … + coef[0] * starting_values[0]start (
Union
[Timestamp
,int
,None
]) – The start of the returned TimeSeries’ index. If a pandas Timestamp is passed, the TimeSeries will have a pandas DatetimeIndex. If an integer is passed, the TimeSeries will have a pandas RangeIndex index. Works only with either length or end.end (
Union
[Timestamp
,int
,None
]) – Optionally, the end of the returned index. Works only with either start or length. If start is set, end must be of same type as start. Else, it can be either a pandas Timestamp or an integer.length (
Optional
[int
]) – Optionally, the length of the returned index. Works only with either start or end.freq (
Union
[str
,int
,None
]) – The time difference between two adjacent entries in the returned index. In case start is a timestamp, a DateOffset alias is expected; see docs. By default, “D” (daily) is used. If start is an integer, freq will be interpreted as the step size in the underlying RangeIndex. The freq is optional for generating an integer index (if not specified, 1 is used).column_name (
Optional
[str
]) – Optionally, the name of the value column for the returned TimeSeries
- Returns
An autoregressive TimeSeries created as indicated above.
- Return type
- darts.utils.timeseries_generation.constant_timeseries(value=1, start=Timestamp('2000-01-01 00:00:00'), end=None, length=None, freq=None, column_name='constant', dtype=<class 'numpy.float64'>)[source]¶
Creates a constant univariate TimeSeries with the given value, length (or end date), start date and frequency.
- Parameters
value (
float
) – The constant value that the TimeSeries object will assume at every index.start (
Union
[Timestamp
,int
,None
]) – The start of the returned TimeSeries’ index. If a pandas Timestamp is passed, the TimeSeries will have a pandas DatetimeIndex. If an integer is passed, the TimeSeries will have a pandas RangeIndex index. Works only with either length or end.end (
Union
[Timestamp
,int
,None
]) – Optionally, the end of the returned index. Works only with either start or length. If start is set, end must be of same type as start. Else, it can be either a pandas Timestamp or an integer.length (
Optional
[int
]) – Optionally, the length of the returned index. Works only with either start or end.freq (
Union
[str
,int
,None
]) –The time difference between two adjacent entries in the returned index. In case start is a timestamp, a DateOffset alias is expected; see docs. By default, “D” (daily) is used. If start is an integer, freq will be interpreted as the step size in the underlying RangeIndex. The freq is optional for generating an integer index (if not specified, 1 is used).
column_name (
Optional
[str
]) – Optionally, the name of the value column for the returned TimeSeriesdtype (
dtype
) – The desired NumPy dtype (np.float32 or np.float64) for the resulting series
- Returns
A constant TimeSeries with value ‘value’.
- Return type
- darts.utils.timeseries_generation.datetime_attribute_timeseries(time_index, attribute, one_hot=False, cyclic=False, until=None, add_length=0, dtype=<class 'numpy.float64'>, with_columns=None, tz=None)[source]¶
Returns a new TimeSeries with index time_index and one or more dimensions containing (optionally one-hot encoded or cyclic encoded) pd.DatatimeIndex attribute information derived from the index.
1-indexed attributes are shifted to enforce 0-indexing across all the encodings.
- Parameters
time_index (
Union
[DatetimeIndex
,TimeSeries
]) – Either a pd.DatetimeIndex attribute which will serve as the basis of the new column(s), or a TimeSeries whose time axis will serve this purpose.attribute (
str
) – An attribute of pd.DatetimeIndex, or week / weekofyear / week_of_year - e.g. “month”, “weekday”, “day”, “hour”, “minute”, “second”. See all available attributes in https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.one_hot (
bool
) – Boolean value indicating whether to add the specified attribute as a one hot encoding (results in more columns).cyclic (
bool
) – Boolean value indicating whether to add the specified attribute as a cyclic encoding. Alternative to one_hot encoding, enable only one of the two. (adds 2 columns, corresponding to sin and cos transformation)until (
Union
[int
,str
,Timestamp
,None
]) – Extend the time_index up until timestamp for datetime indexed series and int for range indexed series, should match or exceed forecasting window.add_length (
int
) – Extend the time_index by add_length, should match or exceed forecasting window. Set only one of until and add_length.dtype – The desired NumPy dtype (np.float32 or np.float64) for the resulting series
with_columns (
Union
[str
,List
[str
],None
]) –Optionally, specify the output component names. * If one_hot and cyclic are
False
, must be a string * If cyclic isTrue
, must be a list of two strings. The first string for the sine, the second for thecosine component name.
- If one_hot is
True
, must be a list of strings of the same length as the generated one hot encoded features.
- If one_hot is
tz (
Optional
[str
]) – Optionally, a time zone to convert the time index to before computing the attributes.
- Returns
New datetime attribute TimeSeries instance.
- Return type
- darts.utils.timeseries_generation.gaussian_timeseries(mean=0.0, std=1.0, start=Timestamp('2000-01-01 00:00:00'), end=None, length=None, freq=None, column_name='gaussian', dtype=<class 'numpy.float64'>)[source]¶
Creates a gaussian univariate TimeSeries by sampling all the series values independently, from a gaussian distribution with mean mean and standard deviation std.
- Parameters
mean (
Union
[float
,ndarray
]) – The mean of the gaussian distribution that is sampled at each step. If a float value is given, the same mean is used at every step. If a numpy.ndarray of floats with the same length as length is given, a different mean is used at each time step.std (
Union
[float
,ndarray
]) – The standard deviation of the gaussian distribution that is sampled at each step. If a float value is given, the same standard deviation is used at every step. If an array of dimension (length, length) is given, it will be used as covariance matrix for a multivariate gaussian distribution.start (
Union
[Timestamp
,int
,None
]) – The start of the returned TimeSeries’ index. If a pandas Timestamp is passed, the TimeSeries will have a pandas DatetimeIndex. If an integer is passed, the TimeSeries will have a pandas RangeIndex index. Works only with either length or end.end (
Union
[Timestamp
,int
,None
]) – Optionally, the end of the returned index. Works only with either start or length. If start is set, end must be of same type as start. Else, it can be either a pandas Timestamp or an integer.length (
Optional
[int
]) – Optionally, the length of the returned index. Works only with either start or end.freq (
Union
[str
,int
,None
]) –The time difference between two adjacent entries in the returned index. In case start is a timestamp, a DateOffset alias is expected; see docs. By default, “D” (daily) is used. If start is an integer, freq will be interpreted as the step size in the underlying RangeIndex. The freq is optional for generating an integer index (if not specified, 1 is used).
column_name (
Optional
[str
]) – Optionally, the name of the value column for the returned TimeSeriesdtype (
dtype
) – The desired NumPy dtype (np.float32 or np.float64) for the resulting series
- Returns
A white noise TimeSeries created as indicated above.
- Return type
- darts.utils.timeseries_generation.holidays_timeseries(time_index, country_code, prov=None, state=None, column_name='holidays', until=None, add_length=0, dtype=<class 'numpy.float64'>, tz=None)[source]¶
Creates a binary univariate TimeSeries with index time_index that equals 1 at every index that lies within (or equals) a selected country’s holiday, and 0 otherwise.
Available countries can be found here.
- Parameters
time_index (
Union
[TimeSeries
,DatetimeIndex
]) – Either a pd.DatetimeIndex or a TimeSeries for which to generate the holidays.country_code (
str
) – The country ISO code.prov (
Optional
[str
]) – The province.state (
Optional
[str
]) – The state.until (
Union
[int
,str
,Timestamp
,None
]) – Extend the time_index up until timestamp for datetime indexed series and int for range indexed series, should match or exceed forecasting window.add_length (
int
) – Extend the time_index by add_length, should match or exceed forecasting window. Set only one of until and add_length.column_name (
Optional
[str
]) – Optionally, the name of the value column for the returned TimeSeries.dtype (
dtype
) – The desired NumPy dtype (np.float32 or np.float64) for the resulting series.tz (
Optional
[str
]) – Optionally, a time zone to convert the time index to before generating the holidays.
- Returns
A new binary holiday TimeSeries instance.
- Return type
- darts.utils.timeseries_generation.linear_timeseries(start_value=0, end_value=1, start=Timestamp('2000-01-01 00:00:00'), end=None, length=None, freq=None, column_name='linear', dtype=<class 'numpy.float64'>)[source]¶
Creates a univariate TimeSeries with a starting value of start_value that increases linearly such that it takes on the value end_value at the last entry of the TimeSeries. This means that the difference between two adjacent entries will be equal to (end_value - start_value) / (length - 1).
- Parameters
start_value (
float
) – The value of the first entry in the TimeSeries.end_value (
float
) – The value of the last entry in the TimeSeries.start (
Union
[Timestamp
,int
,None
]) – The start of the returned TimeSeries’ index. If a pandas Timestamp is passed, the TimeSeries will have a pandas DatetimeIndex. If an integer is passed, the TimeSeries will have a pandas RangeIndex index. Works only with either length or end.end (
Union
[Timestamp
,int
,None
]) – Optionally, the end of the returned index. Works only with either start or length. If start is set, end must be of same type as start. Else, it can be either a pandas Timestamp or an integer.length (
Optional
[int
]) – Optionally, the length of the returned index. Works only with either start or end.freq (
Union
[str
,int
,None
]) –The time difference between two adjacent entries in the returned index. In case start is a timestamp, a DateOffset alias is expected; see docs. By default, “D” (daily) is used. If start is an integer, freq will be interpreted as the step size in the underlying RangeIndex. The freq is optional for generating an integer index (if not specified, 1 is used).
column_name (
Optional
[str
]) – Optionally, the name of the value column for the returned TimeSeriesdtype (
dtype
) – The desired NumPy dtype (np.float32 or np.float64) for the resulting series
- Returns
A linear TimeSeries created as indicated above.
- Return type
- darts.utils.timeseries_generation.random_walk_timeseries(mean=0.0, std=1.0, start=Timestamp('2000-01-01 00:00:00'), end=None, length=None, freq=None, column_name='random_walk', dtype=<class 'numpy.float64'>)[source]¶
Creates a random walk univariate TimeSeries, where each step is obtained by sampling a gaussian distribution with mean mean and standard deviation std.
- Parameters
mean (
float
) – The mean of the gaussian distribution that is sampled at each step.std (
float
) – The standard deviation of the gaussian distribution that is sampled at each step.start (
Union
[Timestamp
,int
,None
]) – The start of the returned TimeSeries’ index. If a pandas Timestamp is passed, the TimeSeries will have a pandas DatetimeIndex. If an integer is passed, the TimeSeries will have a pandas RangeIndex index. Works only with either length or end.end (
Union
[Timestamp
,int
,None
]) – Optionally, the end of the returned index. Works only with either start or length. If start is set, end must be of same type as start. Else, it can be either a pandas Timestamp or an integer.length (
Optional
[int
]) – Optionally, the length of the returned index. Works only with either start or end.freq (
Union
[str
,int
,None
]) –The time difference between two adjacent entries in the returned index. In case start is a timestamp, a DateOffset alias is expected; see docs. By default, “D” (daily) is used. If start is an integer, freq will be interpreted as the step size in the underlying RangeIndex. The freq is optional for generating an integer index (if not specified, 1 is used).
column_name (
Optional
[str
]) – Optionally, the name of the value column for the returned TimeSeriesdtype (
dtype
) – The desired NumPy dtype (np.float32 or np.float64) for the resulting series
- Returns
A random walk TimeSeries created as indicated above.
- Return type
- darts.utils.timeseries_generation.sine_timeseries(value_frequency=0.1, value_amplitude=1.0, value_phase=0.0, value_y_offset=0.0, start=Timestamp('2000-01-01 00:00:00'), end=None, length=None, freq=None, column_name='sine', dtype=<class 'numpy.float64'>)[source]¶
Creates a univariate TimeSeries with a sinusoidal value progression with a given frequency, amplitude, phase and y offset.
- Parameters
value_frequency (
float
) – The number of periods that take place within one time unit given in freq.value_amplitude (
float
) – The maximum difference between any value of the returned TimeSeries and y_offset.value_phase (
float
) – The relative position within one period of the first value of the returned TimeSeries (in radians).value_y_offset (
float
) – The shift of the sine function along the y axis.start (
Union
[Timestamp
,int
,None
]) – The start of the returned TimeSeries’ index. If a pandas Timestamp is passed, the TimeSeries will have a pandas DatetimeIndex. If an integer is passed, the TimeSeries will have a pandas RangeIndex index. Works only with either length or end.end (
Union
[Timestamp
,int
,None
]) – Optionally, the end of the returned index. Works only with either start or length. If start is set, end must be of same type as start. Else, it can be either a pandas Timestamp or an integer.length (
Optional
[int
]) – Optionally, the length of the returned index. Works only with either start or end.freq (
Union
[str
,int
,None
]) –The time difference between two adjacent entries in the returned index. In case start is a timestamp, a DateOffset alias is expected; see docs. By default, “D” (daily) is used. If start is an integer, freq will be interpreted as the step size in the underlying RangeIndex. The freq is optional for generating an integer index (if not specified, 1 is used).
column_name (
Optional
[str
]) – Optionally, the name of the value column for the returned TimeSeriesdtype (
dtype
) – The desired NumPy dtype (np.float32 or np.float64) for the resulting series
- Returns
A sinusoidal TimeSeries parametrized as indicated above.
- Return type