Timeseries¶
TimeSeries
is the main class in darts.
It represents a univariate or multivariate time series, deterministic or stochastic.
The values are stored in an array of shape (time, dimensions, samples), where dimensions are the dimensions (or “components”, or “columns”) of multivariate series, and samples are samples of stochastic series.
- Definitions:
A series with dimensions = 1 is univariate and a series with dimensions > 1 is multivariate.
- A series with samples = 1 is deterministic and a series with samples > 1 isstochastic (or probabilistic).
Each series also stores a time_index, which contains either datetimes (pandas.DateTimeIndex
)
or integer indices (pandas.RangeIndex
).
TimeSeries
are guaranteed to:Have a monotonically increasing time index, without holes (without missing dates)
Contain numeric types only
Have distinct components/columns names
Have a well defined frequency (
date offset aliases for
DateTimeIndex
, and step size forRangeIndex
) - Have static covariates consistent with their components, or no static covariates - Have a hierarchy consistent with their components, or no hierarchy
TimeSeries
can contain global or component-specific static covariate data. Static covariates in darts refers
to external time-invariant data that can be used by some models to help improve predictions.
Read our user guide on covariates and the
TimeSeries
documentation for more information on covariates.
- class darts.timeseries.TimeSeries(xa)[source]¶
Bases:
object
Create a TimeSeries from a (well formed) DataArray. It is recommended to use the factory methods to create TimeSeries instead.
See also
TimeSeries.from_dataframe
Create from a
pandas.DataFrame
.TimeSeries.from_group_dataframe
Create multiple TimeSeries by groups from a
pandas.DataFrame
.TimeSeries.from_series
Create from a
pandas.Series
.TimeSeries.from_values
Create from a NumPy
ndarray
.TimeSeries.from_times_and_values
Create from a time index and a Numpy
ndarray
.TimeSeries.from_csv
Create from a CSV file.
TimeSeries.from_json
Create from a JSON file.
TimeSeries.from_xarray
Create from an
xarray.DataArray
.
Attributes
The bottom level component names of this series, or None if the series has no hierarchy.
The series containing the bottom-level components of this series in the same order as they appear in the series, or None if the series has no hierarchy.
The names of the components, as a Pandas Index.
The names of the components, as a Pandas Index.
The dtype of the series' values.
The duration of this time series (as a time delta or int).
The frequency of the series.
The frequency string representation of the series.
Whether this series is indexed with a DatetimeIndex (otherwise it is indexed with an RangeIndex).
Whether this series is hierarchical or not.
Whether this series is indexed with an RangeIndex (otherwise it is indexed with a DatetimeIndex).
Whether this series contains static covariates.
The hierarchy of this TimeSeries, if any.
Whether this series is deterministic.
Whether this series is stochastic (= probabilistic).
Whether this series is stochastic.
Whether this series is univariate.
Number of components (dimensions) contained in the series.
Number of samples contained in the series.
Number of time steps in the series.
Returns the static covariates contained in the series as a pandas DataFrame.
The name of the time dimension for this time series.
The time index of this time series.
The top level component name of this series, or None if the series has no hierarchy.
The univariate series containing the single top-level component of this series, or None if the series has no hierarchy.
"Width" (= number of components) of the series.
Methods
add_datetime_attribute
(attribute[, one_hot, ...])Build a new series with one (or more) additional component(s) that contain an attribute of the time index of the series.
add_holidays
(country_code[, prov, state])Adds a binary univariate component to the current series that equals 1 at every index that corresponds to selected country's holiday, and 0 otherwise.
all_values
([copy])Return a 3-D array of dimension (time, component, sample), containing this series' values for all samples.
append
(other)Appends another series to this series along the time axis.
append_values
(values)Appends new values to current TimeSeries, extending its time index.
astype
(dtype)Converts this series to a new series with desired dtype.
concatenate
(other[, axis, ignore_time_axis, ...])Concatenate another timeseries to the current one along given axis.
copy
()Make a copy of this series.
data_array
([copy])Return the
xarray.DataArray
representation underlying this series.diff
([n, periods, dropna])Return a differenced time series.
drop_after
(split_point)Drops everything after the provided time split_point, included.
drop_before
(split_point)Drops everything before the provided time split_point, included.
drop_columns
(col_names)Return a new
TimeSeries
instance with dropped columns/components.end_time
()End time of the series.
First value of this univariate series.
First values of this potentially multivariate series.
from_csv
(filepath_or_buffer[, time_col, ...])Build a deterministic TimeSeries instance built from a single CSV file.
from_dataframe
(df[, time_col, value_cols, ...])Build a deterministic TimeSeries instance built from a selection of columns of a DataFrame.
from_group_dataframe
(df, group_cols[, ...])Build a list of TimeSeries instances grouped by a selection of columns from a DataFrame.
from_json
(json_str[, static_covariates, ...])Build a series from the JSON String representation of a
TimeSeries
(produced usingTimeSeries.to_json()
).from_pickle
(path)Read a pickled
TimeSeries
.from_series
(pd_series[, fill_missing_dates, ...])Build a univariate deterministic series from a pandas Series.
from_times_and_values
(times, values[, ...])Build a series from a time index and value array.
from_values
(values[, columns, fillna_value, ...])Build an integer-indexed series from an array of values.
from_xarray
(xa[, fill_missing_dates, freq, ...])Return a TimeSeries instance built from an xarray DataArray.
gaps
([mode])A function to compute and return gaps in the TimeSeries.
get_index_at_point
(point[, after])Converts a point along the time axis index into an integer index ranging in (0, len(series)-1).
get_timestamp_at_point
(point)Converts a point into a pandas.Timestamp (if Datetime-indexed) or into an integer (if Int64-indexed).
has_same_time_as
(other)Checks whether this series has the same time index as other.
head
([size, axis])Return a TimeSeries containing the first size points.
is_within_range
(ts)Check whether a given timestamp or integer is within the time interval of this time series.
kurtosis
(**kwargs)Return a deterministic
TimeSeries
containing the kurtosis of each component (over the samples) of this stochasticTimeSeries
.Last value of this univariate series.
Last values of this potentially multivariate series.
longest_contiguous_slice
([max_gap_size, mode])Return the largest TimeSeries slice of this deterministic series that contains no gaps (contiguous all-NaN values) larger than max_gap_size.
map
(fn)Applies the function fn to the underlying NumPy array containing this series' values.
max
([axis])Return a
TimeSeries
containing the max calculated over the specified axis.mean
([axis])Return a
TimeSeries
containing the mean calculated over the specified axis.median
([axis])Return a
TimeSeries
containing the median calculated over the specified axis.min
([axis])Return a
TimeSeries
containing the min calculated over the specified axis.pd_dataframe
([copy, suppress_warnings])Return a Pandas DataFrame representation of this time series.
pd_series
([copy])Return a Pandas Series representation of this univariate deterministic time series.
plot
([new_plot, central_quantile, ...])Plot the series.
prepend
(other)Prepends (i.e.
prepend_values
(values)Prepends (i.e.
quantile
(quantile, **kwargs)Return a deterministic
TimeSeries
containing the single desired quantile of each component (over the samples) of this stochasticTimeSeries
.quantile_df
([quantile])Return a Pandas DataFrame containing the single desired quantile of each component (over the samples).
quantile_timeseries
([quantile])Return a deterministic
TimeSeries
containing the single desired quantile of each component (over the samples) of this stochasticTimeSeries
.quantiles_df
([quantiles])Return a Pandas DataFrame containing the desired quantiles of each component (over the samples).
random_component_values
([copy])Return a 2-D array of shape (time, component), containing the values for one sample taken uniformly at random among this series' samples.
resample
(freq[, method])Build a reindexed
TimeSeries
with a given frequency.rescale_with_value
(value_at_first_step)Return a new
TimeSeries
, which is a multiple of this series such that the first value is value_at_first_step.shift
(n)Shifts the time axis of this TimeSeries by n time steps.
skew
(**kwargs)Return a deterministic
TimeSeries
containing the skew of each component (over the samples) of this stochasticTimeSeries
.slice
(start_ts, end_ts)Return a new TimeSeries, starting later than start_ts and ending before end_ts.
slice_intersect
(other)Return a
TimeSeries
slice of this series, where the time index has been intersected with the one of the other series.slice_n_points_after
(start_ts, n)Return a new TimeSeries, starting a start_ts (inclusive) and having at most n points.
slice_n_points_before
(end_ts, n)Return a new TimeSeries, ending at end_ts (inclusive) and having at most n points.
split_after
(split_point)Splits the series in two, after a provided split_point.
split_before
(split_point)Splits the series in two, before a provided split_point.
stack
(other)Stacks another univariate or multivariate TimeSeries with the same time index on top of the current one (along the component axis).
Start time of the series.
static_covariates_values
([copy])Return a 2-D array of dimension (component, static variable), containing the static covariate values of the TimeSeries.
std
([ddof])Return a deterministic
TimeSeries
containing the standard deviation of each component (over the samples) of this stochasticTimeSeries
.strip
()Return a
TimeSeries
slice of this deterministic time series, where NaN-only entries at the beginning and the end of the series are removed.sum
([axis])Return a
TimeSeries
containing the sum calculated over the specified axis.tail
([size, axis])Return last size points of the series.
to_csv
(*args, **kwargs)Writes this deterministic series to a CSV file.
to_json
()Return a JSON string representation of this deterministic series.
to_pickle
(path[, protocol])Save this series in pickle format.
univariate_component
(index)Retrieve one of the components of the series and return it as new univariate
TimeSeries
instance.univariate_values
([copy, sample])Return a 1-D Numpy array of shape (time,), containing this univariate series' values for one sample.
values
([copy, sample])Return a 2-D array of shape (time, component), containing this series' values for one sample.
var
([ddof])Return a deterministic
TimeSeries
containing the variance of each component (over the samples) of this stochasticTimeSeries
.window_transform
(transforms[, treat_na, ...])Applies a moving/rolling, expanding or exponentially weighted window transformation over this
TimeSeries
.with_columns_renamed
(col_names, col_names_new)Return a new
TimeSeries
instance with new columns/components names.with_hierarchy
(hierarchy)Adds a hierarchy to the TimeSeries.
with_static_covariates
(covariates)Returns a new TimeSeries object with added static covariates.
with_values
(values)Return a new
TimeSeries
similar to this one but with new specified values.- add_datetime_attribute(attribute, one_hot=False, cyclic=False)[source]¶
Build a new series with one (or more) additional component(s) that contain an attribute of the time index of the series.
The additional components are specified with attribute, such as ‘weekday’, ‘day’ or ‘month’.
This works only for deterministic time series (i.e., made of 1 sample).
- Parameters
attribute – A pd.DatatimeIndex attribute which will serve as the basis of the new column(s).
one_hot (
bool
) – Boolean value indicating whether to add the specified attribute as a one hot encoding (results in more columns).cyclic (
bool
) – Boolean value indicating whether to add the specified attribute as a cyclic encoding. Alternative to one_hot encoding, enable only one of the two. (adds 2 columns, corresponding to sin and cos transformation).
- Returns
New TimeSeries instance enhanced by attribute.
- Return type
- add_holidays(country_code, prov=None, state=None)[source]¶
Adds a binary univariate component to the current series that equals 1 at every index that corresponds to selected country’s holiday, and 0 otherwise.
The frequency of the TimeSeries is daily.
Available countries can be found here.
This works only for deterministic time series (i.e., made of 1 sample).
- Parameters
country_code (
str
) – The country ISO codeprov (
Optional
[str
]) – The provincestate (
Optional
[str
]) – The state
- Returns
A new TimeSeries instance, enhanced with binary holiday component.
- Return type
- all_values(copy=True)[source]¶
Return a 3-D array of dimension (time, component, sample), containing this series’ values for all samples.
- Parameters
copy (
bool
) – Whether to return a copy of the values, otherwise returns a view. Leave it to True unless you know what you are doing.- Returns
The values composing the time series.
- Return type
numpy.ndarray
- append(other)[source]¶
Appends another series to this series along the time axis.
- Parameters
other (
TimeSeries
) – A second TimeSeries.- Returns
A new TimeSeries, obtained by appending the second TimeSeries to the first.
- Return type
See also
TimeSeries.concatenate
concatenate another series along a given axis.
TimeSeries.prepend
prepend (i.e. add to the beginning) another series along the time axis.
- append_values(values)[source]¶
Appends new values to current TimeSeries, extending its time index.
- Parameters
values (
ndarray
) – An array with the values to append.- Returns
A new TimeSeries with the new values appended
- Return type
- astype(dtype)[source]¶
Converts this series to a new series with desired dtype.
- Parameters
dtype (
Union
[str
,dtype
]) – A NumPy dtype (np.float32 or np.float64)- Returns
A TimeSeries having the desired dtype.
- Return type
- property bottom_level_components: Optional[List[str]]¶
The bottom level component names of this series, or None if the series has no hierarchy.
- Return type
Optional
[List
[str
]]
- property bottom_level_series: Optional[List[darts.timeseries.TimeSeries]]¶
The series containing the bottom-level components of this series in the same order as they appear in the series, or None if the series has no hierarchy.
The returned series is multivariate if there are multiple bottom components.
- Return type
Optional
[List
[TimeSeries
]]
- property columns¶
The names of the components, as a Pandas Index.
- property components¶
The names of the components, as a Pandas Index.
- concatenate(other, axis=0, ignore_time_axis=False, ignore_static_covariates=False, drop_hierarchy=True)[source]¶
Concatenate another timeseries to the current one along given axis.
- Parameters
other (TimeSeries) – another timeseries to concatenate to this one
axis (str or int) – axis along which timeseries will be concatenated. [‘time’, ‘component’ or ‘sample’; Default: 0 (time)]
ignore_time_axis (bool, default False) – Ignore errors when time axis varies for some timeseries. Note that this may yield unexpected results
ignore_static_covariates (bool) – whether to ignore all requirements for static covariate concatenation and only transfer the static covariates of the first TimeSeries element in series to the concatenated TimeSeries. Only effective when axis=1.
drop_hierarchy (bool) – When axis=1, whether to drop hierarchy information. True by default. When False, the hierarchies will be “concatenated” as well (by merging the hierarchy dictionaries), which may cause issues if the component names of the resulting series and that of the merged hierarchy do not match. When axis=0 or axis=2, the hierarchy of the first series is always kept.
- Returns
concatenated timeseries
- Return type
See also
concatenate
a function to concatenate multiple series along a given axis.
Notes
When concatenating along the time dimension, the current series marks the start date of the resulting series, and the other series will have its time index ignored.
- data_array(copy=True)[source]¶
Return the
xarray.DataArray
representation underlying this series.- Parameters
copy – Whether to return a copy of the series. Leave it to True unless you know what you are doing.
- Returns
The xarray DataArray underlying this time series.
- Return type
xarray.DataArray
- diff(n=1, periods=1, dropna=True)[source]¶
Return a differenced time series. This is often used to make a time series stationary.
- Parameters
n (
Optional
[int
]) – Optionally, a positive integer indicating the number of differencing steps (default = 1). For instance, n=2 computes the second order differences.periods (
Optional
[int
]) – Optionally, periods to shift for calculating difference. For instance, periods=12 computes the difference between values at time t and times t-12.dropna (
Optional
[bool
]) – Whether to drop the missing values after each differencing steps. If set to False, the corresponding first periods time steps will be filled with NaNs.
- Returns
A TimeSeries constructed after differencing.
- Return type
- drop_after(split_point)[source]¶
Drops everything after the provided time split_point, included. The timestamp may not be in the series. If it is, the timestamp will be dropped.
- Parameters
split_point (
Union
[Timestamp
,float
,int
]) – The timestamp that indicates cut-off time.- Returns
A new TimeSeries, after ts.
- Return type
- drop_before(split_point)[source]¶
Drops everything before the provided time split_point, included. The timestamp may not be in the series. If it is, the timestamp will be dropped.
- Parameters
split_point (
Union
[Timestamp
,float
,int
]) – The timestamp that indicates cut-off time.- Returns
A new TimeSeries, after ts.
- Return type
- drop_columns(col_names)[source]¶
Return a new
TimeSeries
instance with dropped columns/components.- Parameters
col_names (
Union
[List
[str
],str
]) – String or list of strings corresponding to the columns to be dropped.- Returns
A new TimeSeries instance with specified columns dropped.
- Return type
- property dtype¶
The dtype of the series’ values.
- property duration: Union[pandas._libs.tslibs.timedeltas.Timedelta, int]¶
The duration of this time series (as a time delta or int).
- Return type
Union
[Timedelta
,int
]
- end_time()[source]¶
End time of the series.
- Returns
A timestamp containing the last time of the TimeSeries (if indexed by DatetimeIndex), or an integer (if indexed by RangeIndex)
- Return type
Union[pandas.Timestamp, int]
- first_value()[source]¶
First value of this univariate series.
- Returns
The first value of this univariate deterministic time series
- Return type
float
- first_values()[source]¶
First values of this potentially multivariate series.
- Returns
The first values of every component of this deterministic time series
- Return type
np.ndarray
- property freq¶
The frequency of the series.
- property freq_str¶
The frequency string representation of the series.
- classmethod from_csv(filepath_or_buffer, time_col=None, value_cols=None, fill_missing_dates=False, freq=None, fillna_value=None, static_covariates=None, hierarchy=None, **kwargs)[source]¶
Build a deterministic TimeSeries instance built from a single CSV file. One column can be used to represent the time (if not present, the time index will be a RangeIndex) and a list of columns value_cols can be used to indicate the values for this time series.
- Parameters
filepath_or_buffer – The path to the CSV file, or the file object; consistent with the argument of pandas.read_csv function
time_col (
Optional
[str
]) – The time column name. If set, the column will be cast to a pandas DatetimeIndex (if it contains timestamps) or a RangeIndex (if it contains integers). If not set, the pandas RangeIndex will be used.value_cols (
Union
[str
,List
[str
],None
]) – A string or list of strings representing the value column(s) to be extracted from the CSV file. If set to None, all columns from the CSV file will be used (except for the time_col, if specified)fill_missing_dates (
Optional
[bool
]) – Optionally, a boolean value indicating whether to fill missing dates (or indices in case of integer index) with NaN values. This requires either a provided freq or the possibility to infer the frequency from the provided timestamps. See_fill_missing_dates()
for more info.freq (
Union
[str
,int
,None
]) – Optionally, a string or integer representing the frequency of the underlying index. This is useful in order to fill in missing values if some dates are missing and fill_missing_dates is set to True. If a string, represents the frequency of the pandas DatetimeIndex (see offset aliases for more info on supported frequencies). If an integer, represents the step size of the pandas Index or pandas RangeIndex.fillna_value (
Optional
[float
]) – Optionally, a numeric value to fill missing values (NaNs) with.static_covariates (
Union
[Series
,DataFrame
,None
]) – Optionally, a set of static covariates to be added to the TimeSeries. Either a pandas Series or a pandas DataFrame. If a Series, the index represents the static variables. The covariates are globally ‘applied’ to all components of the TimeSeries. If a DataFrame, the columns represent the static variables and the rows represent the components of the uni/multivariate TimeSeries. If a single-row DataFrame, the covariates are globally ‘applied’ to all components of the TimeSeries. If a multi-row DataFrame, the number of rows must match the number of components of the TimeSeries (in this case, the number of columns in the CSV file). This adds control for component-specific static covariates.hierarchy (
Optional
[Dict
]) –Optionally, a dictionary describing the grouping(s) of the time series. The keys are component names, and for a given component name c, the value is a list of component names that c “belongs” to. For instance, if there is a total component, split both in two divisions d1 and d2 and in two regions r1 and r2, and four products d1r1 (in division d1 and region r1), d2r1, d1r2 and d2r2, the hierarchy would be encoded as follows.
hierarchy={ "d1r1": ["d1", "r1"], "d1r2": ["d1", "r2"], "d2r1": ["d2", "r1"], "d2r2": ["d2", "r2"], "d1": ["total"], "d2": ["total"], "r1": ["total"], "r2": ["total"] }
The hierarchy can be used to reconcile forecasts (so that the sums of the forecasts at different levels are consistent), see hierarchical reconciliation.
**kwargs – Optional arguments to be passed to pandas.read_csv function
- Returns
A univariate or multivariate deterministic TimeSeries constructed from the inputs.
- Return type
- classmethod from_dataframe(df, time_col=None, value_cols=None, fill_missing_dates=False, freq=None, fillna_value=None, static_covariates=None, hierarchy=None)[source]¶
Build a deterministic TimeSeries instance built from a selection of columns of a DataFrame. One column (or the DataFrame index) has to represent the time, and a list of columns value_cols has to represent the values for this time series.
- Parameters
df (
DataFrame
) – The DataFrametime_col (
Optional
[str
]) – The time column name. If set, the column will be cast to a pandas DatetimeIndex (if it contains timestamps) or a RangeIndex (if it contains integers). If not set, the DataFrame index will be used. In this case the DataFrame must contain an index that is either a pandas DatetimeIndex, a pandas RangeIndex, or a pandas Index that can be converted to a RangeIndex. It is better if the index has no holes; alternatively setting fill_missing_dates can in some cases solve these issues (filling holes with NaN, or with the provided fillna_value numeric value, if any).value_cols (
Union
[str
,List
[str
],None
]) – A string or list of strings representing the value column(s) to be extracted from the DataFrame. If set to None, the whole DataFrame will be used.fill_missing_dates (
Optional
[bool
]) – Optionally, a boolean value indicating whether to fill missing dates (or indices in case of integer index) with NaN values. This requires either a provided freq or the possibility to infer the frequency from the provided timestamps. See_fill_missing_dates()
for more info.freq (
Union
[str
,int
,None
]) –Optionally, a string or integer representing the frequency of the underlying index. This is useful in order to fill in missing values if some dates are missing and fill_missing_dates is set to True. If a string, represents the frequency of the pandas DatetimeIndex (see offset aliases for more info on supported frequencies). If an integer, represents the step size of the pandas Index or pandas RangeIndex.
fillna_value (
Optional
[float
]) – Optionally, a numeric value to fill missing values (NaNs) with.static_covariates (
Union
[Series
,DataFrame
,None
]) – Optionally, a set of static covariates to be added to the TimeSeries. Either a pandas Series or a pandas DataFrame. If a Series, the index represents the static variables. The covariates are globally ‘applied’ to all components of the TimeSeries. If a DataFrame, the columns represent the static variables and the rows represent the components of the uni/multivariate TimeSeries. If a single-row DataFrame, the covariates are globally ‘applied’ to all components of the TimeSeries. If a multi-row DataFrame, the number of rows must match the number of components of the TimeSeries (in this case, the number of columns invalue_cols
). This adds control for component-specific static covariates.hierarchy (
Optional
[Dict
]) –Optionally, a dictionary describing the grouping(s) of the time series. The keys are component names, and for a given component name c, the value is a list of component names that c “belongs” to. For instance, if there is a total component, split both in two divisions d1 and d2 and in two regions r1 and r2, and four products d1r1 (in division d1 and region r1), d2r1, d1r2 and d2r2, the hierarchy would be encoded as follows.
hierarchy={ "d1r1": ["d1", "r1"], "d1r2": ["d1", "r2"], "d2r1": ["d2", "r1"], "d2r2": ["d2", "r2"], "d1": ["total"], "d2": ["total"], "r1": ["total"], "r2": ["total"] }
The hierarchy can be used to reconcile forecasts (so that the sums of the forecasts at different levels are consistent), see hierarchical reconciliation.
- Returns
A univariate or multivariate deterministic TimeSeries constructed from the inputs.
- Return type
- classmethod from_group_dataframe(df, group_cols, time_col=None, value_cols=None, static_cols=None, fill_missing_dates=False, freq=None, fillna_value=None)[source]¶
Build a list of TimeSeries instances grouped by a selection of columns from a DataFrame. One column (or the DataFrame index) has to represent the time, a list of columns group_cols must be used for extracting the individual TimeSeries by groups, and a list of columns value_cols has to represent the values for the individual time series. Values from columns
group_cols
andstatic_cols
are added as static covariates to the resulting TimeSeries objects. These can be viewed with my_series.static_covariates. Different to group_cols, static_cols only adds the static values but are not used to extract the TimeSeries groups.- Parameters
df (
DataFrame
) – The DataFramegroup_cols (
Union
[List
[str
],str
]) – A string or list of strings representing the columns from the DataFrame by which to extract the individual TimeSeries groups.time_col (
Optional
[str
]) – The time column name. If set, the column will be cast to a pandas DatetimeIndex (if it contains timestamps) or a RangeIndex (if it contains integers). If not set, the DataFrame index will be used. In this case the DataFrame must contain an index that is either a pandas DatetimeIndex, a pandas RangeIndex, or a pandas Index that can be converted to a RangeIndex. Be aware that the index must represents the actual index of each individual time series group (can contain non-unique values). It is better if the index has no holes; alternatively setting fill_missing_dates can in some cases solve these issues (filling holes with NaN, or with the provided fillna_value numeric value, if any).value_cols (
Union
[str
,List
[str
],None
]) – A string or list of strings representing the value column(s) to be extracted from the DataFrame. If set to None, the whole DataFrame will be used.static_cols (
Union
[str
,List
[str
],None
]) – A string or list of strings representing static variable columns from the DataFrame that should be appended as static covariates to the resulting TimeSeries groups. Different to group_cols, the DataFrame is not grouped by these columns. Note that for every group, there must be exactly one unique value.fill_missing_dates (
Optional
[bool
]) – Optionally, a boolean value indicating whether to fill missing dates (or indices in case of integer index) with NaN values. This requires either a provided freq or the possibility to infer the frequency from the provided timestamps. See_fill_missing_dates()
for more info.freq (
Union
[str
,int
,None
]) –Optionally, a string or integer representing the frequency of the underlying index. This is useful in order to fill in missing values if some dates are missing and fill_missing_dates is set to True. If a string, represents the frequency of the pandas DatetimeIndex (see offset aliases for more info on supported frequencies). If an integer, represents the step size of the pandas Index or pandas RangeIndex.
fillna_value (
Optional
[float
]) – Optionally, a numeric value to fill missing values (NaNs) with.
- Returns
A list containing a univariate or multivariate deterministic TimeSeries per group in the DataFrame.
- Return type
List[TimeSeries]
- classmethod from_json(json_str, static_covariates=None, hierarchy=None)[source]¶
Build a series from the JSON String representation of a
TimeSeries
(produced usingTimeSeries.to_json()
).At the moment this only supports deterministic time series (i.e., made of 1 sample).
- Parameters
json_str (
str
) – The JSON String to convertstatic_covariates (
Union
[Series
,DataFrame
,None
]) – Optionally, a set of static covariates to be added to the TimeSeries. Either a pandas Series or a pandas DataFrame. If a Series, the index represents the static variables. The covariates are globally ‘applied’ to all components of the TimeSeries. If a DataFrame, the columns represent the static variables and the rows represent the components of the uni/multivariate TimeSeries. If a single-row DataFrame, the covariates are globally ‘applied’ to all components of the TimeSeries. If a multi-row DataFrame, the number of rows must match the number of components of the TimeSeries (in this case, the number of columns invalue_cols
). This adds control for component-specific static covariates.hierarchy (
Optional
[Dict
]) –Optionally, a dictionary describing the grouping(s) of the time series. The keys are component names, and for a given component name c, the value is a list of component names that c “belongs” to. For instance, if there is a total component, split both in two divisions d1 and d2 and in two regions r1 and r2, and four products d1r1 (in division d1 and region r1), d2r1, d1r2 and d2r2, the hierarchy would be encoded as follows.
hierarchy={ "d1r1": ["d1", "r1"], "d1r2": ["d1", "r2"], "d2r1": ["d2", "r1"], "d2r2": ["d2", "r2"], "d1": ["total"], "d2": ["total"], "r1": ["total"], "r2": ["total"] }
The hierarchy can be used to reconcile forecasts (so that the sums of the forecasts at different levels are consistent), see hierarchical reconciliation.
- Returns
The time series object converted from the JSON String
- Return type
- classmethod from_pickle(path)[source]¶
Read a pickled
TimeSeries
.- Parameters
path (string) – path pointing to a pickle file that will be loaded
- Returns
timeseries object loaded from file
- Return type
Notes
Xarray docs [1] suggest not using pickle as a long-term data storage.
References
- classmethod from_series(pd_series, fill_missing_dates=False, freq=None, fillna_value=None, static_covariates=None)[source]¶
Build a univariate deterministic series from a pandas Series.
The series must contain an index that is either a pandas DatetimeIndex, a pandas RangeIndex, or a pandas Index that can be converted into a RangeIndex. It is better if the index has no holes; alternatively setting fill_missing_dates can in some cases solve these issues (filling holes with NaN, or with the provided fillna_value numeric value, if any).
- Parameters
pd_series (
Series
) – The pandas Series instance.fill_missing_dates (
Optional
[bool
]) – Optionally, a boolean value indicating whether to fill missing dates (or indices in case of integer index) with NaN values. This requires either a provided freq or the possibility to infer the frequency from the provided timestamps. See_fill_missing_dates()
for more info.freq (
Union
[str
,int
,None
]) –Optionally, a string or integer representing the frequency of the underlying index. This is useful in order to fill in missing values if some dates are missing and fill_missing_dates is set to True. If a string, represents the frequency of the pandas DatetimeIndex (see offset aliases for more info on supported frequencies). If an integer, represents the step size of the pandas Index or pandas RangeIndex.
fillna_value (
Optional
[float
]) – Optionally, a numeric value to fill missing values (NaNs) with.static_covariates (
Union
[Series
,DataFrame
,None
]) – Optionally, a set of static covariates to be added to the TimeSeries. Either a pandas Series or a single-row pandas DataFrame. If a Series, the index represents the static variables. If a DataFrame, the columns represent the static variables and the single row represents the univariate TimeSeries component.
- Returns
A univariate and deterministic TimeSeries constructed from the inputs.
- Return type
- classmethod from_times_and_values(times, values, fill_missing_dates=False, freq=None, columns=None, fillna_value=None, static_covariates=None, hierarchy=None)[source]¶
Build a series from a time index and value array.
- Parameters
times (
Union
[DatetimeIndex
,RangeIndex
,Index
]) – A pandas DateTimeIndex, RangeIndex, or Index that can be converted to a RangeIndex representing the time axis for the time series. It is better if the index has no holes; alternatively setting fill_missing_dates can in some cases solve these issues (filling holes with NaN, or with the provided fillna_value numeric value, if any).values (
ndarray
) – A Numpy array of values for the TimeSeries. Both 2-dimensional arrays, for deterministic series, and 3-dimensional arrays, for probabilistic series, are accepted. In the former case the dimensions should be (time, component), and in the latter case (time, component, sample).fill_missing_dates (
Optional
[bool
]) – Optionally, a boolean value indicating whether to fill missing dates (or indices in case of integer index) with NaN values. This requires either a provided freq or the possibility to infer the frequency from the provided timestamps. See_fill_missing_dates()
for more info.freq (
Union
[str
,int
,None
]) –Optionally, a string or integer representing the frequency of the underlying index. This is useful in order to fill in missing values if some dates are missing and fill_missing_dates is set to True. If a string, represents the frequency of the pandas DatetimeIndex (see offset aliases for more info on supported frequencies). If an integer, represents the step size of the pandas Index or pandas RangeIndex.
columns (
Union
[ForwardRef
,ndarray
,ForwardRef
,ForwardRef
,List
,range
,None
]) – Columns to be used by the underlying pandas DataFrame.fillna_value (
Optional
[float
]) – Optionally, a numeric value to fill missing values (NaNs) with.static_covariates (
Union
[Series
,DataFrame
,None
]) – Optionally, a set of static covariates to be added to the TimeSeries. Either a pandas Series or a pandas DataFrame. If a Series, the index represents the static variables. The covariates are globally ‘applied’ to all components of the TimeSeries. If a DataFrame, the columns represent the static variables and the rows represent the components of the uni/multivariate TimeSeries. If a single-row DataFrame, the covariates are globally ‘applied’ to all components of the TimeSeries. If a multi-row DataFrame, the number of rows must match the number of components of the TimeSeries (in this case, the number of columns invalues
). This adds control for component-specific static covariates.hierarchy (
Optional
[Dict
]) –Optionally, a dictionary describing the grouping(s) of the time series. The keys are component names, and for a given component name c, the value is a list of component names that c “belongs” to. For instance, if there is a total component, split both in two divisions d1 and d2 and in two regions r1 and r2, and four products d1r1 (in division d1 and region r1), d2r1, d1r2 and d2r2, the hierarchy would be encoded as follows.
hierarchy={ "d1r1": ["d1", "r1"], "d1r2": ["d1", "r2"], "d2r1": ["d2", "r1"], "d2r2": ["d2", "r2"], "d1": ["total"], "d2": ["total"], "r1": ["total"], "r2": ["total"] }
The hierarchy can be used to reconcile forecasts (so that the sums of the forecasts at different levels are consistent), see hierarchical reconciliation.
- Returns
A TimeSeries constructed from the inputs.
- Return type
- classmethod from_values(values, columns=None, fillna_value=None, static_covariates=None, hierarchy=None)[source]¶
Build an integer-indexed series from an array of values. The series will have an integer index (RangeIndex).
- Parameters
values (
ndarray
) – A Numpy array of values for the TimeSeries. Both 2-dimensional arrays, for deterministic series, and 3-dimensional arrays, for probabilistic series, are accepted. In the former case the dimensions should be (time, component), and in the latter case (time, component, sample).columns (
Union
[ForwardRef
,ndarray
,ForwardRef
,ForwardRef
,List
,range
,None
]) – Columns to be used by the underlying pandas DataFrame.fillna_value (
Optional
[float
]) – Optionally, a numeric value to fill missing values (NaNs) with.static_covariates (
Union
[Series
,DataFrame
,None
]) – Optionally, a set of static covariates to be added to the TimeSeries. Either a pandas Series or a pandas DataFrame. If a Series, the index represents the static variables. The covariates are globally ‘applied’ to all components of the TimeSeries. If a DataFrame, the columns represent the static variables and the rows represent the components of the uni/multivariate TimeSeries. If a single-row DataFrame, the covariates are globally ‘applied’ to all components of the TimeSeries. If a multi-row DataFrame, the number of rows must match the number of components of the TimeSeries (in this case, the number of columns invalues
). This adds control for component-specific static covariates.hierarchy (
Optional
[Dict
]) –Optionally, a dictionary describing the grouping(s) of the time series. The keys are component names, and for a given component name c, the value is a list of component names that c “belongs” to. For instance, if there is a total component, split both in two divisions d1 and d2 and in two regions r1 and r2, and four products d1r1 (in division d1 and region r1), d2r1, d1r2 and d2r2, the hierarchy would be encoded as follows.
hierarchy={ "d1r1": ["d1", "r1"], "d1r2": ["d1", "r2"], "d2r1": ["d2", "r1"], "d2r2": ["d2", "r2"], "d1": ["total"], "d2": ["total"], "r1": ["total"], "r2": ["total"] }
The hierarchy can be used to reconcile forecasts (so that the sums of the forecasts at different levels are consistent), see hierarchical reconciliation.
- Returns
A TimeSeries constructed from the inputs.
- Return type
- classmethod from_xarray(xa, fill_missing_dates=False, freq=None, fillna_value=None)[source]¶
Return a TimeSeries instance built from an xarray DataArray. The dimensions of the DataArray have to be (time, component, sample), in this order. The time dimension can have an arbitrary name, but component and sample must be named “component” and “sample”, respectively.
The first dimension (time), and second dimension (component) must be indexed (i.e., have coordinates). The time must be indexed either with a pandas DatetimeIndex, a pandas RangeIndex, or a pandas Index that can be converted to a RangeIndex. It is better if the index has no holes; alternatively setting fill_missing_dates can in some cases solve these issues (filling holes with NaN, or with the provided fillna_value numeric value, if any).
If two components have the same name or are not strings, this method will disambiguate the components names by appending a suffix of the form “<name>_N” to the N-th column with name “name”. The component names in the static covariates and hierarchy (if any) are not disambiguated.
- Parameters
xa (
DataArray
) – The xarray DataArrayfill_missing_dates (
Optional
[bool
]) – Optionally, a boolean value indicating whether to fill missing dates (or indices in case of integer index) with NaN values. This requires either a provided freq or the possibility to infer the frequency from the provided timestamps. See_fill_missing_dates()
for more info.freq (
Union
[str
,int
,None
]) –Optionally, a string or integer representing the frequency of the underlying index. This is useful in order to fill in missing values if some dates are missing and fill_missing_dates is set to True. If a string, represents the frequency of the pandas DatetimeIndex (see offset aliases for more info on supported frequencies). If an integer, represents the step size of the pandas Index or pandas RangeIndex.
fillna_value (
Optional
[float
]) – Optionally, a numeric value to fill missing values (NaNs) with.
- Returns
A univariate or multivariate deterministic TimeSeries constructed from the inputs.
- Return type
- gaps(mode='all')[source]¶
A function to compute and return gaps in the TimeSeries. Works only on deterministic time series (1 sample).
- Parameters
mode (
Literal
[‘all’, ‘any’]) – Only relevant for multivariate time series. The mode defines how gaps are defined. Set to ‘any’ if a NaN value in any columns should be considered as as gaps. ‘all’ will only consider periods where all columns’ values are NaN. Defaults to ‘all’.- Returns
A pandas.DataFrame containing a row for every gap (rows with all-NaN values in underlying DataFrame) in this time series. The DataFrame contains three columns that include the start and end time stamps of the gap and the integer length of the gap (in self.freq units if the series is indexed by a DatetimeIndex).
- Return type
pd.DataFrame
- get_index_at_point(point, after=True)[source]¶
Converts a point along the time axis index into an integer index ranging in (0, len(series)-1).
- Parameters
point (
Union
[Timestamp
,float
,int
]) –This parameter supports 3 different data types:
pd.Timestamp
,float
andint
.pd.Timestamp
work only on series that are indexed with apd.DatetimeIndex
. In such cases, the returned point will be the index of this timestamp if it is present in the series time index. It it’s not present in the time index, the index of the next timestamp is returned if after=True (if it exists in the series), otherwise the index of the previous timestamp is returned (if it exists in the series).In case of a
float
, the parameter will be treated as the proportion of the time series that should lie before the point.If an
int
and series is datetime-indexed, the value of point is returned. If anint
and series is integer-indexed, the index position of point in the RangeIndex is returned (accounting for steps).after – If the provided pandas Timestamp is not in the time series index, whether to return the index of the next timestamp or the index of the previous one.
- Return type
int
- get_timestamp_at_point(point)[source]¶
Converts a point into a pandas.Timestamp (if Datetime-indexed) or into an integer (if Int64-indexed).
- Parameters
point (
Union
[Timestamp
,float
,int
]) – This parameter supports 3 different data types: float, int and pandas.Timestamp. In case of a float, the parameter will be treated as the proportion of the time series that should lie before the point. In the case of int, the parameter will be treated as an integer index to the time index of series. Will raise a ValueError if not a valid index in series In case of a pandas.Timestamp, point will be returned as is provided that the timestamp is present in the series time index, otherwise will raise a ValueError.- Return type
Timestamp
- property has_datetime_index: bool¶
Whether this series is indexed with a DatetimeIndex (otherwise it is indexed with an RangeIndex).
- Return type
bool
- property has_hierarchy: bool¶
Whether this series is hierarchical or not.
- Return type
bool
- property has_range_index: bool¶
Whether this series is indexed with an RangeIndex (otherwise it is indexed with a DatetimeIndex).
- Return type
bool
- has_same_time_as(other)[source]¶
Checks whether this series has the same time index as other.
- Parameters
other (
TimeSeries
) – the other series- Returns
True if both TimeSeries have the same index, False otherwise.
- Return type
bool
- property has_static_covariates: bool¶
Whether this series contains static covariates.
- Return type
bool
- head(size=5, axis=0)[source]¶
Return a TimeSeries containing the first size points.
- Parameters
size (int, default 5) – number of points to retain
axis (str or int, optional, default: 0) – axis along which to slice the series
- Returns
The series made of the first size points along the desired axis.
- Return type
- property hierarchy: Optional[Dict]¶
The hierarchy of this TimeSeries, if any. If set, the hierarchy is encoded as a dictionary, whose keys are individual components and values are the set of parent(s) of these components in the hierarchy.
- Return type
Optional
[Dict
]
- property is_deterministic¶
Whether this series is deterministic.
- property is_probabilistic¶
Whether this series is stochastic (= probabilistic).
- property is_stochastic¶
Whether this series is stochastic.
- property is_univariate¶
Whether this series is univariate.
- is_within_range(ts)[source]¶
Check whether a given timestamp or integer is within the time interval of this time series. If a timestamp is provided, it does not need to be an element of the time index of the series.
- Parameters
ts (
Union
[Timestamp
,int
]) – The pandas.Timestamp (if indexed with DatetimeIndex) or integer (if indexed with RangeIndex) to check.- Returns
Whether ts is contained within the interval of this time series.
- Return type
bool
- kurtosis(**kwargs)[source]¶
Return a deterministic
TimeSeries
containing the kurtosis of each component (over the samples) of this stochasticTimeSeries
.This works only on stochastic series (i.e., with more than 1 sample)
- Parameters
kwargs – Other keyword arguments are passed down to scipy.stats.kurtosis()
- Returns
The TimeSeries containing the kurtosis for each component.
- Return type
- last_value()[source]¶
Last value of this univariate series.
- Returns
The last value of this univariate deterministic time series
- Return type
float
- last_values()[source]¶
Last values of this potentially multivariate series.
- Returns
The last values of every component of this deterministic time series
- Return type
np.ndarray
- longest_contiguous_slice(max_gap_size=0, mode='all')[source]¶
Return the largest TimeSeries slice of this deterministic series that contains no gaps (contiguous all-NaN values) larger than max_gap_size.
This method is only applicable to deterministic series (i.e., having 1 sample).
- Parameters
max_gap_size (
int
) – Indicate the maximum gap size that the TimeSerie can containmode (
str
) – Only relevant for multivariate time series. The mode defines how gaps are defined. Set to ‘any’ if a NaN value in any columns should be considered as as gaps. ‘all’ will only consider periods where all columns’ values are NaN. Defaults to ‘all’.
- Returns
a new series constituting the largest slice of the original with no or bounded gaps
- Return type
See also
TimeSeries.gaps
return the gaps in the TimeSeries
- map(fn)[source]¶
Applies the function fn to the underlying NumPy array containing this series’ values.
Return a new TimeSeries instance. If fn takes 1 argument it is simply applied on the backing array of shape (time, n_components, n_samples). If it takes 2 arguments, it is applied repeatedly on the (ts, value[ts]) tuples, where “ts” denotes a timestamp value, and “value[ts]” denote the array of values at this timestamp, of shape (n_components, n_samples).
- Parameters
fn (
Union
[Callable
[[number
],number
],Callable
[[Union
[Timestamp
,int
],number
],number
]]) – Either a function which takes a NumPy array and returns a NumPy array of same shape; e.g., lambda x: x ** 2, lambda x: x / x.shape[0] or np.log. It can also be a function which takes a timestamp and array, and returns a new array of same shape; e.g., lambda ts, x: x / ts.days_in_month. The type of ts is either pd.Timestamp (if the series is indexed with a DatetimeIndex), or an integer otherwise (if the series is indexed with an RangeIndex).- Returns
A new TimeSeries instance
- Return type
- max(axis=2)[source]¶
Return a
TimeSeries
containing the max calculated over the specified axis.If we reduce over time (
axis=1
), the resultingTimeSeries
will have length one and will use the first entry of the originaltime_index
. If we perform the calculation over the components (axis=1
), the resulting single component will be renamed to “components_max”. When applied to the samples (axis=2
), a deterministicTimeSeries
is returned.If
axis=1
, the static covariates and the hierarchy are discarded from the series.- Parameters
axis (
int
) – The axis to reduce over. The default is to calculate over samples, i.e. axis=2.- Returns
A new TimeSeries with max applied to the indicated axis.
- Return type
- mean(axis=2)[source]¶
Return a
TimeSeries
containing the mean calculated over the specified axis.If we reduce over time (
axis=1
), the resultingTimeSeries
will have length one and will use the first entry of the originaltime_index
. If we perform the calculation over the components (axis=1
), the resulting single component will be renamed to “components_mean”. When applied to the samples (axis=2
), a deterministicTimeSeries
is returned.If
axis=1
, the static covariates and the hierarchy are discarded from the series.- Parameters
axis (
int
) – The axis to reduce over. The default is to calculate over samples, i.e. axis=2.- Returns
A new TimeSeries with mean applied to the indicated axis.
- Return type
- median(axis=2)[source]¶
Return a
TimeSeries
containing the median calculated over the specified axis.If we reduce over time (
axis=1
), the resultingTimeSeries
will have length one and will use the first entry of the originaltime_index
. If we perform the calculation over the components (axis=1
), the resulting single component will be renamed to “components_median”. When applied to the samples (axis=2
), a deterministicTimeSeries
is returned.If
axis=1
, the static covariates and the hierarchy are discarded from the series.- Parameters
axis (
int
) – The axis to reduce over. The default is to calculate over samples, i.e. axis=2.- Returns
A new TimeSeries with median applied to the indicated axis.
- Return type
- min(axis=2)[source]¶
Return a
TimeSeries
containing the min calculated over the specified axis.If we reduce over time (
axis=1
), the resultingTimeSeries
will have length one and will use the first entry of the originaltime_index
. If we perform the calculation over the components (axis=1
), the resulting single component will be renamed to “components_min”. When applied to the samples (axis=2
), a deterministicTimeSeries
is returned.If
axis=1
, the static covariates and the hierarchy are discarded from the series.- Parameters
axis (
int
) – The axis to reduce over. The default is to calculate over samples, i.e. axis=2.- Returns
A new TimeSeries with min applied to the indicated axis.
- Return type
- property n_components¶
Number of components (dimensions) contained in the series.
- property n_samples¶
Number of samples contained in the series.
- property n_timesteps¶
Number of time steps in the series.
- pd_dataframe(copy=True, suppress_warnings=False)[source]¶
Return a Pandas DataFrame representation of this time series.
Each of the series components will appear as a column in the DataFrame. If the series is stochastic, the samples are returned as columns of the dataframe with column names as ‘component_s#’ (e.g. with two components and two samples: ‘comp0_s0’, ‘comp0_s1’ ‘comp1_s0’ ‘comp1_s1’).
- Parameters
copy – Whether to return a copy of the dataframe. Leave it to True unless you know what you are doing.
- Returns
The Pandas DataFrame representation of this time series
- Return type
pandas.DataFrame
- pd_series(copy=True)[source]¶
Return a Pandas Series representation of this univariate deterministic time series.
Works only for univariate series that are deterministic (i.e., made of 1 sample).
- Parameters
copy – Whether to return a copy of the series. Leave it to True unless you know what you are doing.
- Returns
A Pandas Series representation of this univariate time series.
- Return type
pandas.Series
- plot(new_plot=False, central_quantile=0.5, low_quantile=0.05, high_quantile=0.95, default_formatting=True, label='', *args, **kwargs)[source]¶
Plot the series.
This is a wrapper method around
xarray.DataArray.plot()
.- Parameters
new_plot (
bool
) – whether to spawn a new Figurecentral_quantile (
Union
[float
,str
]) – The quantile (between 0 and 1) to plot as a “central” value, if the series is stochastic (i.e., if it has multiple samples). This will be applied on each component separately (i.e., to display quantiles of the components’ marginal distributions). For instance, setting central_quantile=0.5 will plot the median of each component. central_quantile can also be set to ‘mean’.low_quantile (
Optional
[float
]) – The quantile to use for the lower bound of the plotted confidence interval. Similar to central_quantile, this is applied to each component separately (i.e., displaying marginal distributions). No confidence interval is shown if confidence_low_quantile is None (default 0.05).high_quantile (
Optional
[float
]) – The quantile to use for the upper bound of the plotted confidence interval. Similar to central_quantile, this is applied to each component separately (i.e., displaying marginal distributions). No confidence interval is shown if high_quantile is None (default 0.95).default_formatting (
bool
) – Whether or not to use the darts default scheme.label (
Union
[str
,Sequence
[str
],None
]) – A prefix that will appear in front of each component of the TimeSeries or a list of string of length the number of components in the plotted TimeSeries (default “”).args – some positional arguments for the plot() method
kwargs – some keyword arguments for the plot() method
- prepend(other)[source]¶
Prepends (i.e. adds to the beginning) another series to this series along the time axis.
- Parameters
other (
TimeSeries
) – A second TimeSeries.- Returns
A new TimeSeries, obtained by appending the second TimeSeries to the first.
- Return type
See also
Timeseries.append
append (i.e. add to the end) another series along the time axis.
TimeSeries.concatenate
concatenate another series along a given axis.
- prepend_values(values)[source]¶
Prepends (i.e. adds to the beginning) new values to current TimeSeries, extending its time index into the past.
- Parameters
values (
ndarray
) – An array with the values to prepend to the start.- Returns
A new TimeSeries with the new values prepended.
- Return type
- quantile(quantile, **kwargs)[source]¶
Return a deterministic
TimeSeries
containing the single desired quantile of each component (over the samples) of this stochasticTimeSeries
.The components in the new series are named “<component>_X”, where “<component>” is the column name corresponding to this component, and “X” is the quantile value. The quantile columns represent the marginal distributions of the components of this series.
This works only on stochastic series (i.e., with more than 1 sample)
- Parameters
quantile (
float
) – The desired quantile value. The value must be represented as a fraction (between 0 and 1 inclusive). For instance, 0.5 will return a TimeSeries containing the median of the (marginal) distribution of each component.kwargs – Other keyword arguments are passed down to numpy.quantile()
- Returns
The TimeSeries containing the desired quantile for each component.
- Return type
- quantile_df(quantile=0.5)[source]¶
Return a Pandas DataFrame containing the single desired quantile of each component (over the samples).
Each of the series components will appear as a column in the DataFrame. The column will be named “<component>_X”, where “<component>” is the column name corresponding to this component, and “X” is the quantile value. The quantile columns represent the marginal distributions of the components of this series.
This works only on stochastic series (i.e., with more than 1 sample)
- Parameters
quantile – The desired quantile value. The value must be represented as a fraction (between 0 and 1 inclusive). For instance, 0.5 will return a DataFrame containing the median of the (marginal) distribution of each component.
- Returns
The Pandas DataFrame containing the desired quantile for each component.
- Return type
pandas.DataFrame
- quantile_timeseries(quantile=0.5, **kwargs)[source]¶
Return a deterministic
TimeSeries
containing the single desired quantile of each component (over the samples) of this stochasticTimeSeries
.The components in the new series are named “<component>_X”, where “<component>” is the column name corresponding to this component, and “X” is the quantile value. The quantile columns represent the marginal distributions of the components of this series.
This works only on stochastic series (i.e., with more than 1 sample)
- Parameters
quantile – The desired quantile value. The value must be represented as a fraction (between 0 and 1 inclusive). For instance, 0.5 will return a TimeSeries containing the median of the (marginal) distribution of each component.
kwargs – Other keyword arguments are passed down to numpy.quantile()
- Returns
The TimeSeries containing the desired quantile for each component.
- Return type
- quantiles_df(quantiles=(0.1, 0.5, 0.9))[source]¶
Return a Pandas DataFrame containing the desired quantiles of each component (over the samples).
Each of the series components will appear as a column in the DataFrame. The column will be named “<component>_X”, where “<component>” is the column name corresponding to this component, and “X” is the quantile value. The quantiles represent the marginal distributions of the components of this series.
This works only on stochastic series (i.e., with more than 1 sample)
- Parameters
quantiles (
Tuple
[float
]) – Tuple containing the desired quantiles. The values must be represented as fractions (between 0 and 1 inclusive). For instance, (0.1, 0.5, 0.9) will return a DataFrame containing the 10th-percentile, median and 90th-percentile of the (marginal) distribution of each component.- Returns
The Pandas DataFrame containing the quantiles for each component.
- Return type
pandas.DataFrame
- random_component_values(copy=True)[source]¶
Return a 2-D array of shape (time, component), containing the values for one sample taken uniformly at random among this series’ samples.
- Parameters
copy (
bool
) – Whether to return a copy of the values, otherwise returns a view. Leave it to True unless you know what you are doing.- Returns
The values composing one sample taken at random from the time series.
- Return type
numpy.ndarray
- resample(freq, method='pad', **kwargs)[source]¶
Build a reindexed
TimeSeries
with a given frequency. Provided method is used to fill holes in reindexed TimeSeries, by default ‘pad’.- Parameters
freq (
str
) – The new time difference between two adjacent entries in the returned TimeSeries. A DateOffset alias is expected.method (
str
) –Method to fill holes in reindexed TimeSeries (note this does not fill NaNs that already were present):
’pad’: propagate last valid observation forward to next valid
’backfill’: use NEXT valid observation to fill.
kwargs – some keyword arguments for the xarray.resample method, notably loffset or base to indicate where to start the resampling and avoid nan at the first value of the resampled TimeSeries For more informations, see the xarray resample() documentation.
Examples
>>> times = pd.date_range(start=pd.Timestamp("20200101233000"), periods=6, freq="15T") >>> pd_series = pd.Series(range(6), index=times) >>> ts = TimeSeries.from_series(pd_series) >>> print(ts.time_index) DatetimeIndex(['2020-01-01 23:30:00', '2020-01-01 23:45:00', '2020-01-02 00:00:00', '2020-01-02 00:15:00', '2020-01-02 00:30:00', '2020-01-02 00:45:00'], dtype='datetime64[ns]', name='time', freq='15T') >>> resampled_nokwargs_ts = ts.resample(freq="1h") >>> print(resampled_nokwargs_ts.time_index) DatetimeIndex(['2020-01-01 23:00:00', '2020-01-02 00:00:00'], dtype='datetime64[ns]', name='time', freq='H') >>> print(resampled_nokwargs_ts.values()) [[nan] [ 2.]] >>> resampled_ts = ts.resample(freq="1h", loffset="30T") >>> print(resampled_ts.time_index) DatetimeIndex(['2020-01-01 23:30:00', '2020-01-02 00:30:00'], dtype='datetime64[ns]', name='time', freq='H') >>> print(resampled_ts.values()) [[0.] [4.]]
- Returns
A reindexed TimeSeries with given frequency.
- Return type
- rescale_with_value(value_at_first_step)[source]¶
Return a new
TimeSeries
, which is a multiple of this series such that the first value is value_at_first_step. (Note: numerical errors can appear with value_at_first_step > 1e+24).- Parameters
value_at_first_step (
float
) – The new value for the first entry of the TimeSeries.- Returns
A new TimeSeries, where the first value is value_at_first_step and other values have been scaled accordingly.
- Return type
- shift(n)[source]¶
Shifts the time axis of this TimeSeries by n time steps.
If \(n > 0\), shifts in the future. If \(n < 0\), shifts in the past.
For example, with \(n=2\) and freq=’M’, March 2013 becomes May 2013. With \(n=-2\), March 2013 becomes Jan 2013.
- Parameters
n (
int
) – The number of time steps (in self.freq unit) to shift by. Can be negative.- Returns
A new TimeSeries, with a shifted index.
- Return type
- skew(**kwargs)[source]¶
Return a deterministic
TimeSeries
containing the skew of each component (over the samples) of this stochasticTimeSeries
.This works only on stochastic series (i.e., with more than 1 sample)
- Parameters
kwargs – Other keyword arguments are passed down to scipy.stats.skew()
- Returns
The TimeSeries containing the skew for each component.
- Return type
- slice(start_ts, end_ts)[source]¶
Return a new TimeSeries, starting later than start_ts and ending before end_ts. For series having DatetimeIndex, this is inclusive on both ends. For series having a RangeIndex, end_ts is exclusive.
start_ts and end_ts don’t have to be in the series.
- Parameters
start_ts (
Union
[Timestamp
,int
]) – The timestamp that indicates the left cut-off.end_ts (
Union
[Timestamp
,int
]) – The timestamp that indicates the right cut-off.
- Returns
A new series, with indices greater or equal than start_ts and smaller or equal than end_ts.
- Return type
- slice_intersect(other)[source]¶
Return a
TimeSeries
slice of this series, where the time index has been intersected with the one of the other series.This method is in general not symmetric.
- Parameters
other (
TimeSeries
) – the other time series- Returns
a new series, containing the values of this series, over the time-span common to both time series.
- Return type
- slice_n_points_after(start_ts, n)[source]¶
Return a new TimeSeries, starting a start_ts (inclusive) and having at most n points.
The provided timestamps will be included in the series.
- Parameters
start_ts (
Union
[Timestamp
,int
]) – The timestamp or index that indicates the splitting time.n (
int
) – The maximal length of the new TimeSeries.
- Returns
A new TimeSeries, with length at most n, starting at start_ts
- Return type
- slice_n_points_before(end_ts, n)[source]¶
Return a new TimeSeries, ending at end_ts (inclusive) and having at most n points.
The provided timestamps will be included in the series.
- Parameters
end_ts (
Union
[Timestamp
,int
]) – The timestamp or index that indicates the splitting time.n (
int
) – The maximal length of the new TimeSeries.
- Returns
A new TimeSeries, with length at most n, ending at start_ts
- Return type
- split_after(split_point)[source]¶
Splits the series in two, after a provided split_point.
- Parameters
split_point (
Union
[Timestamp
,float
,int
]) – A timestamp, float or integer. If float, represents the proportion of the series to include in the first TimeSeries (must be between 0.0 and 1.0). If integer, represents the index position after which the split is performed. A pd.Timestamp can be provided for TimeSeries that are indexed by a pd.DatetimeIndex. In such cases, the timestamp will be contained in the first TimeSeries, but not in the second one. The timestamp itself does not have to appear in the original TimeSeries index.- Returns
A tuple of two time series. The first time series contains the first samples up to the split_point, and the second contains the remaining ones.
- Return type
Tuple[TimeSeries, TimeSeries]
- split_before(split_point)[source]¶
Splits the series in two, before a provided split_point.
- Parameters
split_point (
Union
[Timestamp
,float
,int
]) – A timestamp, float or integer. If float, represents the proportion of the series to include in the first TimeSeries (must be between 0.0 and 1.0). If integer, represents the index position before which the split is performed. A pd.Timestamp can be provided for TimeSeries that are indexed by a pd.DatetimeIndex. In such cases, the timestamp will be contained in the second TimeSeries, but not in the first one. The timestamp itself does not have to appear in the original TimeSeries index.- Returns
A tuple of two time series. The first time series contains the first samples up to the split_point, and the second contains the remaining ones.
- Return type
Tuple[TimeSeries, TimeSeries]
- stack(other)[source]¶
Stacks another univariate or multivariate TimeSeries with the same time index on top of the current one (along the component axis).
Return a new TimeSeries that includes all the components of self and of other.
The resulting TimeSeries will have the same name for its time dimension as this TimeSeries, and the same number of samples.
- Parameters
other (
TimeSeries
) – A TimeSeries instance with the same index and the same number of samples as the current one.- Returns
A new multivariate TimeSeries instance.
- Return type
- start_time()[source]¶
Start time of the series.
- Returns
A timestamp containing the first time of the TimeSeries (if indexed by DatetimeIndex), or an integer (if indexed by RangeIndex)
- Return type
Union[pandas.Timestamp, int]
- property static_covariates: Optional[pandas.core.frame.DataFrame]¶
Returns the static covariates contained in the series as a pandas DataFrame. The columns represent the static variables and the rows represent the components of the uni/multivariate series.
- Return type
Optional
[DataFrame
]
- static_covariates_values(copy=True)[source]¶
Return a 2-D array of dimension (component, static variable), containing the static covariate values of the TimeSeries.
- Parameters
copy (
bool
) – Whether to return a copy of the values, otherwise returns a view. Can only return a view if all values have the same dtype. Leave it to True unless you know what you are doing.- Returns
The static covariate values if the series has static covariates, else None.
- Return type
Optional[numpy.ndarray]
- std(ddof=1)[source]¶
Return a deterministic
TimeSeries
containing the standard deviation of each component (over the samples) of this stochasticTimeSeries
.This works only on stochastic series (i.e., with more than 1 sample)
- Parameters
ddof (
int
) – “Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof where N represents the number of elements. By default, ddof is 1.- Returns
The TimeSeries containing the standard deviation for each component.
- Return type
- strip()[source]¶
Return a
TimeSeries
slice of this deterministic time series, where NaN-only entries at the beginning and the end of the series are removed. No entries after (and including) the first non-NaN entry and before (and including) the last non-NaN entry are removed.This method is only applicable to deterministic series (i.e., having 1 sample).
- Returns
a new series based on the original where NaN-only entries at start and end have been removed
- Return type
- sum(axis=2)[source]¶
Return a
TimeSeries
containing the sum calculated over the specified axis.If we reduce over time (
axis=1
), the resultingTimeSeries
will have length one and will use the first entry of the originaltime_index
. If we perform the calculation over the components (axis=1
), the resulting single component will be renamed to “components_sum”. When applied to the samples (axis=2
), a deterministicTimeSeries
is returned.If
axis=1
, the static covariates and the hierarchy are discarded from the series.- Parameters
axis (
int
) – The axis to reduce over. The default is to calculate over samples, i.e. axis=2.- Returns
A new TimeSeries with sum applied to the indicated axis.
- Return type
- tail(size=5, axis=0)[source]¶
Return last size points of the series.
- Parameters
size (int, default: 5) – number of points to retain
axis (str or int, optional, default: 0 (time dimension)) – axis along which we intend to display records
- Returns
The series made of the last size points along the desired axis.
- Return type
- property time_dim: str¶
The name of the time dimension for this time series.
- Return type
str
- property time_index: Union[pandas.core.indexes.datetimes.DatetimeIndex, pandas.core.indexes.range.RangeIndex]¶
The time index of this time series.
- Return type
Union
[DatetimeIndex
,RangeIndex
]
- to_csv(*args, **kwargs)[source]¶
Writes this deterministic series to a CSV file. For a list of parameters, refer to the documentation of
pandas.DataFrame.to_csv()
[1].References
- to_json()[source]¶
Return a JSON string representation of this deterministic series.
At the moment this function works only on deterministic time series (i.e., made of 1 sample).
Notes
Static covariates are not returned in the JSON string. When using TimeSeries.from_json(), the static covariates can be added with input argument static_covariates.
- Returns
A JSON String representing the time series
- Return type
str
- to_pickle(path, protocol=5)[source]¶
Save this series in pickle format.
- Parameters
path (string) – path to a file where current object will be pickled
protocol (integer, default highest) – pickling protocol. The default is best in most cases, use it only if having backward compatibility issues
Notes
Xarray docs [1] suggest not using pickle as a long-term data storage.
References
- property top_level_component: Optional[str]¶
The top level component name of this series, or None if the series has no hierarchy.
- Return type
Optional
[str
]
- property top_level_series: Optional[darts.timeseries.TimeSeries]¶
The univariate series containing the single top-level component of this series, or None if the series has no hierarchy.
- Return type
Optional
[TimeSeries
]
- univariate_component(index)[source]¶
Retrieve one of the components of the series and return it as new univariate
TimeSeries
instance.This drops the hierarchy (if any), and retains only the relevant static covariates column.
- Parameters
index (
Union
[str
,int
]) – An zero-indexed integer indicating which component to retrieve. If components have names, this can be a string with the component’s name.- Returns
A new univariate TimeSeries instance.
- Return type
- univariate_values(copy=True, sample=0)[source]¶
Return a 1-D Numpy array of shape (time,), containing this univariate series’ values for one sample.
- Parameters
copy (
bool
) – Whether to return a copy of the values. Leave it to True unless you know what you are doing.sample (
int
) – For stochastic series, the sample for which to return values. Default: 0 (first sample).
- Returns
The values composing the time series guaranteed to be univariate.
- Return type
numpy.ndarray
- values(copy=True, sample=0)[source]¶
Return a 2-D array of shape (time, component), containing this series’ values for one sample.
- Parameters
copy (
bool
) – Whether to return a copy of the values, otherwise returns a view. Leave it to True unless you know what you are doing.sample (
int
) – For stochastic series, the sample for which to return values. Default: 0 (first sample).
- Returns
The values composing the time series.
- Return type
numpy.ndarray
- var(ddof=1)[source]¶
Return a deterministic
TimeSeries
containing the variance of each component (over the samples) of this stochasticTimeSeries
.This works only on stochastic series (i.e., with more than 1 sample)
- Parameters
ddof (
int
) – “Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof where N represents the number of elements. By default, ddof is 1.- Returns
The TimeSeries containing the variance for each component.
- Return type
- property width¶
“Width” (= number of components) of the series.
- window_transform(transforms, treat_na=None, forecasting_safe=True, keep_non_transformed=False, include_current=True)[source]¶
Applies a moving/rolling, expanding or exponentially weighted window transformation over this
TimeSeries
.- Parameters
transforms (
Union
[Dict
,Sequence
[Dict
]]) –A dictionary or a list of dictionaries. Each dictionary specifies a different window transform.
The dictionaries can contain the following keys:
"function"
Mandatory. The name of one of the pandas builtin transformation functions, or a callable function that can be applied to the input series. Pandas’ functions can be found in the documentation.
"mode"
Optional. The name of the pandas windowing mode on which the
"function"
is going to be applied. The options are “rolling”, “expanding” and “ewm”. If not provided, Darts defaults to “expanding”. User defined functions can use either “rolling” or “expanding” modes. More information on pandas windowing operations can be found in the documentation."components"
Optional. A string or list of strings specifying the TimeSeries components on which the transformation should be applied. If not specified, the transformation will be applied on all components.
"function_name"
Optional. A string specifying the function name referenced as part of the transformation output name. For example, given a user-provided function transformation on rolling window size of 5 on the component “comp”, the default transformation output name is “rolling_udf_5_comp” whereby “udf” refers to “user defined function”. If specified, the
"function_name"
will replace the default name “udf”. Similarly, the"function_name"
will replace the name of the pandas builtin transformation function name in the output name.
All other dictionary items provided will be treated as keyword arguments for the windowing mode (i.e.,
rolling/ewm/expanding
) or for the specific function in that mode (i.e.,pandas.DataFrame.rolling.mean/std/max/min...
orpandas.DataFrame.ewm.mean/std/sum
). This allows for more flexibility in configuring the transformation, by providing for example:"window"
Size of the moving window for the “rolling” mode. If an integer, the fixed number of observations used for each window. If an offset, the time period of each window with data type
pandas.Timedelta
representing a fixed duration.
"min_periods"
The minimum number of observations in the window required to have a value (otherwise NaN). Darts reuses pandas defaults of 1 for “rolling” and “expanding” modes and of 0 for “ewm” mode.
"win_type"
The type of weigthing to apply to the window elements. If provided, it should be one of scipy.signal.windows.
"center"
True
/False
to set the observation at the current timestep at the center of the window (whenforecasting_safe
is True, Darts enforces"center"
toFalse
).
"closed"
"right"
/"left"
/"both"
/"neither"
to specify whether the right, left or both ends of the window are included in the window, or neither of them. Darts defaults to pandas default of"right"
.
More information on the available functions and their parameters can be found in the Pandas documentation.
For user-provided functions, extra keyword arguments in the transformation dictionary are passed to the user-defined function. By default, Darts expects user-defined functions to receive numpy arrays as input. This can be modified by adding item
"raw": False
in the transformation dictionary. It is expected that the function returns a single value for each window. Other possible configurations can be found in the pandas.DataFrame.rolling().apply() documentation and pandas.DataFrame.expanding().apply() documentation.treat_na (
Union
[str
,int
,float
,None
]) –Specifies how to treat missing values that were added by the window transformations at the beginning of the resulting TimeSeries. By default, Darts will leave NaNs in the resulting TimeSeries. This parameter can be one of the following:
"dropna"
to truncate the TimeSeries and drop rows containing missing values. If multiple columns contain different numbers of missing values, only the minimum number of rows is dropped. This operation might reduce the length of the resulting TimeSeries.
"bfill"
or"backfill"
to specify that NaNs should be filled with the last transformed and valid observation. If the original TimeSeries starts with NaNs, those are kept. When
forecasting_safe
isTrue
, this option returns an exception to avoid future observation contaminating the past.
- an integer or float
in which case NaNs will be filled with this value. All columns will be filled with the same provided value.
forecasting_safe (
Optional
[bool
]) – If True, Darts enforces that the resulting TimeSeries is safe to be used in forecasting models as target or as feature. The window transformation will not allow future values to be included in the computations at their corresponding current timestep. Default isTrue
. “ewm” and “expanding” modes are forecasting safe by default. “rolling” mode is forecasting safe if"center": False
is guaranteed.keep_non_transformed (
Optional
[bool
]) –False
to return the transformed components only,True
to return all original components along the transformed ones. Default isFalse
.include_current (
Optional
[bool
]) –True
to include the current time step in the window,False
to exclude it. Default isTrue
.
- Returns
Returns a new TimeSeries instance with the transformed components. If
keep_non_transformed
isTrue
, the resulting TimeSeries will contain the original non-transformed components along the transformed ones. If the input series is stochastic, all samples are identically transformed. The naming convention for the transformed components is as follows: [window_mode]_[function_name]_[window_size if provided]_[min_periods if not default]_[original_comp_name], e.g., rolling_sum_3_comp_0 (i.e., window_mode= rolling, function_name = sum, window_size=3, original_comp_name=comp_0) ; ewm_mean_comp_1 (i.e., window_mode= ewm, function_name = mean, original_comp_name=comp_1); expanding_sum_3_comp_2 (i.e., window_mode= expanding, function_name = sum, window_size=3, original_comp_name=comp_2). For user-defined functions, function_name = udf.- Return type
- with_columns_renamed(col_names, col_names_new)[source]¶
Return a new
TimeSeries
instance with new columns/components names. It also adapts the names in the hierarchy, if any.- Parameters
col_names (
Union
[List
[str
],str
]) – String or list of strings corresponding the the column names to be changed.col_names_new (
Union
[List
[str
],str
]) – String or list of strings corresponding to the new column names. Must be the same length as col_names.
- Returns
A new TimeSeries instance.
- Return type
- with_hierarchy(hierarchy)[source]¶
Adds a hierarchy to the TimeSeries.
- Parameters
hierarchy (
Dict
[str
,Union
[str
,List
[str
]]]) –A dictionary mapping components to a list of their parent(s) in the hierarchy. Single parents may be specified as string or list containing one string. For example, assume the series contains the components
["total", "a", "b", "x", "y", "ax", "ay", "bx", "by"]
, the following dictionary would encode the groupings shown on this figure:hierarchy = {'ax': ['a', 'x'], 'ay': ['a', 'y'], 'bx': ['b', 'x'], 'by': ['b', 'y'], 'a': ['total'], 'b': ['total'], 'x': 'total', # or use a single string 'y': 'total'}
- with_static_covariates(covariates)[source]¶
Returns a new TimeSeries object with added static covariates.
Static covariates contain data attached to the time series, but which are not varying with time.
- Parameters
covariates (
Union
[Series
,DataFrame
,None
]) – Optionally, a set of static covariates to be added to the TimeSeries. Either a pandas Series, a pandas DataFrame, or None. If None, will set the static covariates to None. If a Series, the index represents the static variables. The covariates are then globally ‘applied’ to all components of the TimeSeries. If a DataFrame, the columns represent the static variables and the rows represent the components of the uni/multivariate TimeSeries. If a single-row DataFrame, the covariates are globally ‘applied’ to all components of the TimeSeries. If a multi-row DataFrame, the number of rows must match the number of components of the TimeSeries. This adds component-specific static covariates.
Notes
If there are a large number of static covariates variables (i.e., the static covariates have a very large dimension), there might be a noticeable performance penalty for creating
TimeSeries
objects, unless the covariates already have the samedtype
as the series data.Examples
>>> import pandas as pd >>> from darts.utils.timeseries_generation import linear_timeseries >>> # add global static covariates >>> static_covs = pd.Series([0., 1.], index=["static_cov_1", "static_cov_2"]) >>> series = linear_timeseries(length=3) >>> series_new1 = series.with_static_covariates(static_covs) >>> series_new1.static_covariates static_cov_1 static_cov_2 component linear 0.0 1.0
>>> # add component specific static covariates >>> static_covs_multi = pd.DataFrame([[0., 1.], [2., 3.]], columns=["static_cov_1", "static_cov_2"]) >>> series_multi = series.stack(series) >>> series_new2 = series_multi.with_static_covariates(static_covs_multi) >>> series_new2.static_covariates static_cov_1 static_cov_2 component linear 0.0 1.0 linear_1 2.0 3.0
- with_values(values)[source]¶
Return a new
TimeSeries
similar to this one but with new specified values.- Parameters
values (
ndarray
) – A Numpy array with new values. It must have the dimensions for time and componentns, but may contain a different number of samples.- Returns
A new TimeSeries with the new values and same index, static covariates and hierarchy
- Return type
- darts.timeseries.concatenate(series, axis=0, ignore_time_axis=False, ignore_static_covariates=False, drop_hierarchy=True)[source]¶
Concatenates multiple
TimeSeries
along a given axis.axis
can be an integer in (0, 1, 2) to denote (time, component, sample) or, alternatively, a string denoting the corresponding dimension of the underlyingDataArray
.- Parameters
series (Sequence[TimeSeries]) – sequence of
TimeSeries
to concatenateaxis (Union[str, int]) – axis along which the series will be concatenated.
ignore_time_axis (bool) – Allow concatenation even when some series do not have matching time axes. When done along component or sample dimensions, concatenation will work as long as the series have the same lengths (in this case the resulting series will have the time axis of the first provided series). When done along time dimension, concatenation will work even if the time axes are not contiguous (in this case, the resulting series will have a start time matching the start time of the first provided series). Default: False.
ignore_static_covariates (bool) – whether to ignore all requirements for static covariate concatenation and only transfer the static covariates of the first TimeSeries element in series to the concatenated TimeSeries. Only effective when axis=1.
drop_hierarchy (bool) – When axis=1, whether to drop hierarchy information. True by default. When False, the hierarchies will be “concatenated” as well (by merging the hierarchy dictionaries), which may cause issues if the component names of the resulting series and that of the merged hierarchy do not match. When axis=0 or axis=2, the hierarchy of the first series is always kept.
- Returns
concatenated series
- Return type