Time Series Statistics¶
- darts.utils.statistics.check_seasonality(ts, m=None, max_lag=24, alpha=0.05)[source]¶
Checks whether the TimeSeries ts is seasonal with period m or not.
If m is None, we work under the assumption that there is a unique seasonality period, which is inferred from the Auto-correlation Function (ACF).
- Parameters
ts (
TimeSeries
) – The time series to check for seasonality.m (
Optional
[int
,None
]) – The seasonality period to check.max_lag (
int
) – The maximal lag allowed in the ACF.alpha (
float
) – The desired confidence level (default 5%).
- Returns
A tuple (season, m), where season is a boolean indicating whether the series has seasonality or not and m is the seasonality period.
- Return type
Tuple[bool, int]
- darts.utils.statistics.extract_trend_and_seasonality(ts, freq=None, model=ModelMode.MULTIPLICATIVE, method='naive', **kwargs)[source]¶
Extracts trend and seasonality from a TimeSeries instance using statsmodels.tsa.
- Parameters
ts (
TimeSeries
) – The series to decomposefreq (
Union
[int
,Sequence
[int
],None
]) – The seasonality period to use.model (
Union
[SeasonalityMode
,ModelMode
]) – The type of decomposition to use. Must befrom darts.utils.utils import ModelMode, SeasonalityMode
Enum member. EitherMULTIPLICATIVE
orADDITIVE
. DefaultsModelMode.MULTIPLICATIVE
.method (
str
) – The method to be used to decompose the series. - “naive” : Seasonal decomposition using moving averages [1]. - “STL” : Season-Trend decomposition using LOESS [2]. Only compatible withADDITIVE
model type. - “MSTL” : Season-Trend decomposition using LOESS with multiple seasonalities [3]. Only compatible withADDITIVE
model type.kwargs – Other keyword arguments are passed down to the decomposition method.
- Returns
A tuple of (trend, seasonal) time series.
- Return type
Tuple[TimeSeries, TimeSeries]
References
- darts.utils.statistics.granger_causality_tests(ts_cause, ts_effect, maxlag, addconst=True)[source]¶
Provides four tests for granger non causality of 2 time series using
statsmodels.tsa.stattools.grangercausalitytests()
. See [1].- Parameters
ts_cause (
TimeSeries
) – A univariate deterministic time series. The statistical test determines if this time series ‘Granger causes’ the time series ts_effect (second parameter). Missing values are not supported. if H_0 (non causality) is rejected (p near 0), then there is a ‘granger causality’.ts_effect (
TimeSeries
) – Univariate time series ‘Granger caused’ by ts_cause.maxlag (
int
) – If an integer, computes the test for all lags up to maxlag. If an iterable, computes the tests only for the lags in maxlag.addconst (
bool
) – Include a constant in the model.
- Returns
All test results, dictionary keys are the number of lags. For each lag the values are a tuple, with the first element a dictionary with test statistic, pvalues, degrees of freedom, the second element are the OLS estimation results for the restricted model, the unrestricted model and the restriction (contrast) matrix for the parameter f_test.
- Return type
Dict
References
- darts.utils.statistics.plot_acf(ts, m=None, max_lag=24, alpha=0.05, bartlett_confint=True, fig_size=(10, 5), axis=None, default_formatting=True)[source]¶
Plots the Autocorrelation Function (ACF) of ts, highlighting it at lag m, with corresponding significance interval. Uses
statsmodels.tsa.stattools.acf()
[1]- Parameters
ts (
TimeSeries
) – The TimeSeries whose ACF should be plotted.m (
Optional
[int
,None
]) – Optionally, a time lag to highlight on the plot.max_lag (
int
) – The maximal lag order to consider.alpha (
float
) – The confidence interval to display.bartlett_confint (
bool
) – The boolean value indicating whether the confidence interval should be calculated using Bartlett’s formula. If set to True, the confidence interval can be used in the model identification stage for fitting ARIMA models. If set to False, the confidence interval can be used to test for randomness (i.e. there is no time dependence in the data) of the data.fig_size (
tuple
[int
,int
]) – The size of the figure to be displayed.axis (
Optional
[axis
,None
]) – Optionally, an axis object to plot the ACF on.default_formatting (
bool
) – Whether to use the darts default scheme.
References
- Return type
None
- darts.utils.statistics.plot_ccf(ts, ts_other, m=None, max_lag=24, alpha=0.05, bartlett_confint=True, fig_size=(10, 5), axis=None, default_formatting=True)[source]¶
Plots the Cross Correlation Function (CCF) between ts and ts_other, highlighting it at lag m, with corresponding significance interval. Uses
statsmodels.tsa.stattools.ccf()
[1]This can be used to find the cross correlation between the target and different covariates lags. If ts_other is identical ts, it corresponds to plot_acf().
- Parameters
ts (
TimeSeries
) – The TimeSeries whose CCF with ts_other should be plotted.ts_other (
TimeSeries
) – The TimeSeries which to compare against ts in the CCF. E.g. check the cross correlation of different covariate lags with the target.m (
Optional
[int
,None
]) – Optionally, a time lag to highlight on the plot.max_lag (
int
) – The maximal lag order to consider.alpha (
float
) – The confidence interval to display.bartlett_confint (
bool
) – The boolean value indicating whether the confidence interval should be calculated using Bartlett’s formula.fig_size (
tuple
[int
,int
]) – The size of the figure to be displayed.axis (
Optional
[axis
,None
]) – Optionally, an axis object to plot the CCF on.default_formatting (
bool
) – Whether to use the darts default scheme.
References
- Return type
None
- darts.utils.statistics.plot_hist(data, bins=None, density=False, title=None, fig_size=None, ax=None)[source]¶
This function plots the histogram of values in a TimeSeries instance or an array-like.
All types of TimeSeries are supported (uni-, multivariate, deterministic, stochastic). Depending on the number of components in data, up to four histograms can be plotted on one figure. All stochastic samples will be displayed with the corresponding component.
If data is an array-like, ALL values will be displayed in the same histogram.
- Parameters
data (
Union
[TimeSeries
,list
[float
],ndarray
]) – TimeSeries instance or an array-like from which to plot the histogram.bins (
Union
[int
,ndarray
,list
[float
],None
]) – Optionally, either an integer value for the number of bins to be displayed or an array-like of floats determining the position of bins.density (
bool
) – bool, if density is set to True, the bin counts will be converted to probability densitytitle (
Optional
[str
,None
]) – The title of the figure to be displayedfig_size (
Optional
[tuple
[int
,int
],None
]) – The size of the figure to be displayed.ax (
Optional
[axis
,None
]) – Optionally, an axis object to plot the histogram on.
- Return type
None
- darts.utils.statistics.plot_pacf(ts, m=None, max_lag=24, method='ywadjusted', alpha=0.05, fig_size=(10, 5), axis=None, default_formatting=True)[source]¶
Plots the Partial Autocorrelation Function (PACF) of ts, highlighting it at lag m, with corresponding significance interval. Uses
statsmodels.tsa.stattools.pacf()
[1]- Parameters
ts (
TimeSeries
) – The TimeSeries whose ACF should be plotted.m (
Optional
[int
,None
]) – Optionally, a time lag to highlight on the plot.max_lag (
int
) – The maximal lag order to consider.method (
str
) –The method to be used for the PACF calculation. - | “yw” or “ywadjusted” : Yule-Walker with sample-size adjustment in
denominator for acovf. Default.”ywm” or “ywmle” : Yule-Walker without adjustment.
”ols” : regression of time series on lags of it and on constant.
”ols-inefficient” : regression of time series on lags using a single common sample to estimate all pacf coefficients.
”ols-adjusted” : regression of time series on lags with a bias adjustment.
”ld” or “ldadjusted” : Levinson-Durbin recursion with bias correction.
”ldb” or “ldbiased” : Levinson-Durbin recursion without bias correction.
alpha (
float
) – The confidence interval to display.fig_size (
tuple
[int
,int
]) – The size of the figure to be displayed.axis (
Optional
[axis
,None
]) – Optionally, an axis object to plot the ACF on.default_formatting (
bool
) – Whether to use the darts default scheme.
References
- Return type
None
- darts.utils.statistics.plot_residuals_analysis(residuals, num_bins=20, fill_nan=True, default_formatting=True, acf_max_lag=24)[source]¶
Plots data relevant to residuals.
This function takes a univariate TimeSeries instance of residuals and plots their values, their distribution and their ACF. Please note that if the residual TimeSeries instance contains NaN values, the plots might be displayed incorrectly. If fill_nan is set to True, the missing values will be interpolated.
- Parameters
residuals (
TimeSeries
) – Univariate TimeSeries instance representing residuals.num_bins (
int
) – Optionally, an integer value determining the number of bins in the histogram.fill_nan (
bool
) – A boolean value indicating whether NaN values should be filled in the residuals.default_formatting (
bool
) – Whether to use the darts default scheme.acf_max_lag (
int
) – The maximum lag to be displayed in the ACF plot. Must be less than residuals length.
- Return type
None
- darts.utils.statistics.remove_from_series(ts, other, model)[source]¶
Removes the TimeSeries other from the TimeSeries ts as specified by model. Use e.g. to remove an additive or multiplicative trend from a series.
- Parameters
ts (
TimeSeries
) – The TimeSeries to be modified.other (
TimeSeries
) – The TimeSeries to remove.model (
Union
[SeasonalityMode
,ModelMode
]) – The type of model considered. Must befrom darts.utils.utils import ModelMode, SeasonalityMode
Enums member. EitherMULTIPLICATIVE
orADDITIVE
. DefaultsModelMode.MULTIPLICATIVE
.
- Returns
A TimeSeries defined by removing other from ts.
- Return type
- darts.utils.statistics.remove_seasonality(ts, freq=None, model=SeasonalityMode.MULTIPLICATIVE, method='naive', **kwargs)[source]¶
Adjusts the TimeSeries ts for a seasonality of order frequency using the model decomposition.
- Parameters
ts (
TimeSeries
) – The TimeSeries to adjust.freq (
Optional
[int
,None
]) – The seasonality period to use.model (
SeasonalityMode
) – The type of decomposition to use. Must be a from darts import SeasonalityMode Enum member. Either SeasonalityMode.MULTIPLICATIVE or SeasonalityMode.ADDITIVE. Defaults SeasonalityMode.MULTIPLICATIVE.method (
str
) – The method to be used to decompose the series. - “naive” : Seasonal decomposition using moving averages [1]. - “STL” : Season-Trend decomposition using LOESS [2]. Only compatible withADDITIVE
model type. Defaults to “naive”kwargs – Other keyword arguments are passed down to the decomposition method.
- Returns
A new TimeSeries instance that corresponds to the seasonality-adjusted ‘ts’.
- Return type
References
- darts.utils.statistics.remove_trend(ts, model=ModelMode.MULTIPLICATIVE, method='naive', **kwargs)[source]¶
Adjusts the TimeSeries ts for a trend using the model decomposition.
- Parameters
ts (
TimeSeries
) – The TimeSeries to adjust.model (
ModelMode
) – The type of decomposition to use. Must be afrom darts.utils.utils import ModelMode
Enum member. EitherMULTIPLICATIVE
orADDITIVE
. DefaultsModelMode.MULTIPLICATIVE
.method (
str
) – The method to be used to decompose the series. - “naive” : Seasonal decomposition using moving averages [1]_. - “STL” : Season-Trend decomposition using LOESS [2]_. Only compatible withADDITIVE
model type. Defaults to “naive”kwargs – Other keyword arguments are passed down to the decomposition method.
- Returns
A new TimeSeries instance that corresponds to the trend-adjusted ‘ts’.
- Return type
- darts.utils.statistics.stationarity_test_adf(ts, maxlag=None, regression='c', autolag='AIC')[source]¶
Provides Augmented Dickey-Fuller unit root test for a time series, using
statsmodels.tsa.stattools.adfuller()
. See [1].- Parameters
ts (
TimeSeries
) – The time series to test.maxlag (
Optional
[int
,None
]) – Maximum lag which is included in test, default value of 12*(nobs/100)^{1/4} is used when None.regression (
str
) – Constant and trend order to include in regression. “c” : constant only (default). “ct” : constant and trend. “ctt” : constant, and linear and quadratic trend. “n” : no constant, no trend.autolag (
Optional
[None
,str
]) – Method to use when automatically determining the lag length among the values 0, 1, …, maxlag. If “AIC” (default) or “BIC”, then the number of lags is chosen to minimize the corresponding information criterion. “t-stat” based choice of maxlag. Starts with maxlag and drops a lag until the t-statistic on the last lag length is significant using a 5%-sized test. If None, then the number of included lags is set to maxlag.
- Returns
- adf: The test statistic.pvalue: MacKinnon’s approximate p-value based on [2].usedlag: The number of lags used.nobs: The number of observations used for the ADF regression and calculation of the critical values.critical: Critical values for the test statistic at the 1 %, 5 %, and 10 % levels. Based on [2].icbest: The maximized information criterion if autolag is not None.
- Return type
set
References
- 1
https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.adfuller.html
- 2(1,2)
MacKinnon (1994, 2010)
- darts.utils.statistics.stationarity_test_kpss(ts, regression='c', nlags='auto')[source]¶
Provides Kwiatkowski-Phillips-Schmidt-Shin test for stationarity for a time series, using
statsmodels.tsa.stattools.kpss()
. See [1].- Parameters
ts (
TimeSeries
) – The time series to test.regression (
str
) – The null hypothesis for the KPSS test. ‘c’ : The data is stationary around a constant (default). ‘ct’ : The data is stationary around a trend.nlags (
Union
[str
,int
]) – Indicates the number of lags to be used. If ‘auto’ (default), lags is calculated using the data-dependent method of Hobijn et al. (1998). See also Andrews (1991), Newey & West (1994), and Schwert (1989). If set to ‘legacy’, uses int(12 * (n / 100)**(1 / 4)) , as outlined in Schwert (1989).
- Returns
- kpss_stat: The test statistic.pvalue: The p-value of the test. The p-value is interpolated from Table 1 in [2],and a boundary point is returned if the test statistic is outside the table of critical values,that is, if the p-value is outside the interval (0.01, 0.1).lags: The truncation lag parameter.crit: The critical values at 10%, 5%, 2.5% and 1%. Based on [2].
- Return type
set
References
- 1
https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.kpss.html
- 2(1,2)
Kwiatkowski et al. (1992)
- darts.utils.statistics.stationarity_tests(ts, p_value_threshold_adfuller=0.05, p_value_threshold_kpss=0.05)[source]¶
Double test on stationarity using both Kwiatkowski-Phillips-Schmidt-Shin and Augmented Dickey-Fuller statistical tests.
WARNING Because Augmented Dickey-Fuller is testing null hypothesis that ts IS NOT stationary and Kwiatkowski-Phillips-Schmidt-Shin that ts IS stationary, we can’t really decide on the same p_value threshold for both tests in general. It seems reasonable to keep them both at 0.05. If other threshold has to be tested, they have to go in opposite direction (for example, p_value_threshold_adfuller = 0.01 and p_value_threshold_kpss = 0.1).
- Parameters
ts (
TimeSeries
) – The TimeSeries to test.p_value_threshold_adfuller (
float
) – p_value threshold to reject stationarity for Augmented Dickey-Fuller test.p_value_threshold_kpss (
float
) – p_value threshold to reject non-stationarity for Kwiatkowski-Phillips-Schmidt-Shin test.
- Returns
If ts is stationary or not.
- Return type
bool