"""
Time Axes Encoders
------------------
Encoders can generate past and/or future covariates series by encoding the index of a TimeSeries `series`.
Each encoder class has methods `encode_train()`, `encode_inference()`, and `encode_train_inference()` to generate the
encodings for training and inference.
The encoders extract the index either from the target series or optional additional past/future covariates.
If additional covariates are supplied to `encode_train()`, `encode_inference()`, or `encode_train_inference()`,
the time index of those covariates are used for the encodings. This means that the input covariates must meet the same
model-specific requirements as without encoders.
There are two main types of encoder classes: `SingleEncoder` and `SequentialEncoder`.
* SingleEncoder
The SingleEncoder classes carry the encoder logic for past and future covariates, and training and
inference datasets. They can be used as stand-alone encoders.
Each SingleEncoder has a dedicated subclass for generating past or future covariates. The naming convention
is `{X}{SingleEncoder}` where {X} is one of (Past, Future) and {SingleEncoder} is one of the SingleEncoder
classes described in the next section. An example:
.. highlight:: python
.. code-block:: python
encoder = PastDatetimeAttributeEncoder(
input_chunk_length=24,
output_chunk_length=12,
attribute='month'
tz='CET'
)
past_covariates_train = encoder.encode_train(
target=target,
covariates=optional_past_covariates
)
past_covariates_inf = encoder.encode_inference(
n=12,
target=target,
covariates=optional_past_covariates
)
# or generate encodings for train and inference together
past_covariates_train_inf = encoder.encode_train_inference(
n=12,
target=target,
covariates=optional_past_covariates
)
* SequentialEncoder
Stores and controls multiple SingleEncoders for both past and/or future covariates all under one hood.
It provides the same functionality as SingleEncoders (`encode_train()`, `encode_inference()`, and
`encode_train_inference()`).
It can be used both as stand-alone or as an all-in-one solution with Darts' forecasting models that support
covariates through optional parameter `add_encoders`:
.. highlight:: python
.. code-block:: python
model = SomeForecastingModel(..., add_encoders={...})
..
If used at model creation, the SequentialEncoder will handle all past and future encoders autonomously.
The requirements for model parameter `add_encoders` are described in the next section or in
:meth:`SequentialEncoder <darts.dataprocessing.encoders.SequentialEncoder>`.
SingleEncoder
-------------
The SingleEncoders from {X}{SingleEncoder} are:
* `DatetimeAttributeEncoder`
Adds scalar pd.DatatimeIndex attribute information derived from `series.time_index`.
Requires `series` to have a pd.DatetimeIndex.
attribute
An attribute of `pd.DatetimeIndex`: see all available attributes in
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.
tz
Optionally, convert the time zone naive index to a time zone `tz` before applying the encoder.
* `CyclicTemporalEncoder`
Adds cyclic pd.DatetimeIndex attribute information deriveed from `series.time_index`.
Adds 2 columns, corresponding to sin and cos encodings, to uniquely describe the underlying attribute.
Requires `series` to have a pd.DatetimeIndex.
attribute
An attribute of `pd.DatetimeIndex` that follows a cyclic pattern. One of ('month', 'day', 'weekday',
'dayofweek', 'day_of_week', 'hour', 'minute', 'second', 'microsecond', 'nanosecond', 'quarter',
'dayofyear', 'day_of_year', 'week', 'weekofyear', 'week_of_year').
tz
Optionally, convert the time zone naive index to a time zone `tz` before applying the encoder.
* `IntegerIndexEncoder`
Adds the relative index positions as integer values (positions) derived from `series` time index.
`series` can either have a pd.DatetimeIndex or an integer index.
attribute
Currently, only 'relative' is supported.
'relative' will generate position values relative to the forecasting/prediction point. Values range
from -inf to inf where 0 is set at the forecasting point.
* `CallableIndexEncoder`
Applies a user-defined callable to encode `series`' index.
`series` can either have a pd.DatetimeIndex or an integer index.
attribute
a callable/function to encode the index.
For `series` with a pd.DatetimeIndex: ``lambda index: (index.year - 1950) / 50``.
For `series` with an integer index: ``lambda index: index / 50``
SequentialEncoder
-----------------
The SequentialEncoder combines the logic of all SingleEncoders from above and has additional benefits:
* use multiple encoders at once
* generate multiple attribute encodings at once
* generate both past and future at once
* supports transformers (Scaler)
* easy to use with any forecasting model that supports covariates.
The model parameter `add_encoders` must be a Dict following of this convention:
* outer keys: `SingleEncoder` and Transformer tags:
* 'datetime_attribute' for `DatetimeAttributeEncoder`
* 'cyclic' for `CyclicEncoder`
* 'position' for `IntegerIndexEncoder`
* 'custom' for `CallableIndexEncoder`
* 'transformer' for a transformer
* 'tz' for applying a time zone conversion
* inner keys: covariates type
* 'past' for past covariates
* 'future' for future covariates
* (do not specify for 'transformer')
* inner key values:
* list of attributes for `SingleEncoder`
* transformer object for 'transformer'
Below is an example that illustrates a valid `add_encoders` dict for hourly data and how it can be used with a
TorchForecastingModel (this is only meant to illustrate many features at once).
.. highlight:: python
.. code-block:: python
add_encoders = {
'cyclic': {'future': ['month']},
'datetime_attribute': {'future': ['hour', 'dayofweek']},
'position': {'past': ['relative'], 'future': ['relative']},
'custom': {'past': [lambda idx: (idx.year - 1950) / 50]},
'transformer': Scaler(),
'tz': 'CET',
}
model = SomeTorchForecastingModel(..., add_encoders=add_encoders)
..
"""
import copy
from collections.abc import Sequence
from typing import Callable, Optional, Union
import numpy as np
import pandas as pd
from darts import TimeSeries, concatenate
from darts.dataprocessing.encoders.encoder_base import (
CovariatesIndexGenerator,
Encoder,
FutureCovariatesIndexGenerator,
PastCovariatesIndexGenerator,
SequentialEncoderTransformer,
SingleEncoder,
SupportedIndex,
_EncoderMethod,
)
from darts.dataprocessing.transformers import FittableDataTransformer
from darts.logging import get_logger, raise_if, raise_if_not
from darts.timeseries import DIMS
from darts.utils.timeseries_generation import datetime_attribute_timeseries
from darts.utils.ts_utils import seq2series, series2seq
from darts.utils.utils import generate_index
SupportedTimeSeries = Union[TimeSeries, Sequence[TimeSeries]]
logger = get_logger(__name__)
ENCODER_KEYS = ["cyclic", "datetime_attribute", "position", "custom"]
FUTURE = "future"
PAST = "past"
VALID_TIME_PARAMS = [FUTURE, PAST]
VALID_ENCODER_DTYPES = (str, Sequence)
TZ_KEYS = ["tz"]
TRANSFORMER_KEYS = ["transformer"]
VALID_TRANSFORMER_DTYPES = FittableDataTransformer
INTEGER_INDEX_ATTRIBUTES = ["relative"]
[docs]class CyclicTemporalEncoder(SingleEncoder):
"""`CyclicTemporalEncoder`: Cyclic encoding of time series datetime attributes."""
def __init__(
self,
index_generator: CovariatesIndexGenerator,
attribute: str,
tz: Optional[str] = None,
):
"""
Cyclic index encoding for `TimeSeries` that have a time index of type `pd.DatetimeIndex`.
Parameters
----------
index_generator
An instance of `CovariatesIndexGenerator` with methods `generate_train_idx()` and
`generate_inference_idx()`. Used to generate the index for encoders.
attribute
The attribute of the underlying pd.DatetimeIndex from for which to apply cyclic encoding.
Must be an attribute of `pd.DatetimeIndex`, or `week` / `weekofyear` / `week_of_year` - e.g. "month",
"weekday", "day", "hour", "minute", "second". See all available attributes in
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.
For more information, check out :meth:`datetime_attribute_timeseries()
<darts.utils.timeseries_generation.datetime_attribute_timeseries>`
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(index_generator)
self.attribute = attribute
self.tz = tz
def _encode(
self, index: SupportedIndex, target_end: pd.Timestamp, dtype: np.dtype
) -> TimeSeries:
"""applies cyclic encoding from `datetime_attribute_timeseries()` to `self.attribute` of `index`."""
super()._encode(index, target_end, dtype)
return datetime_attribute_timeseries(
index,
attribute=self.attribute,
cyclic=True,
dtype=dtype,
with_columns=[
self.base_component_name + self.attribute + "_sin",
self.base_component_name + self.attribute + "_cos",
],
tz=self.tz,
)
@property
def accept_transformer(self) -> list[bool]:
"""`CyclicTemporalEncoder` should not be transformed. Returns two elements for sine and cosine waves."""
return [False, False]
@property
def requires_fit(self) -> bool:
return False
@property
def base_component_name(self) -> str:
return super().base_component_name + "_cyc_"
@property
def encoding_n_components(self) -> int:
return 2
[docs]class PastCyclicEncoder(CyclicTemporalEncoder):
"""`CyclicEncoder`: Cyclic encoding of past covariates datetime attributes."""
def __init__(
self,
attribute: str,
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[list[int]] = None,
tz: Optional[str] = None,
):
"""
Parameters
----------
attribute
The attribute of the underlying pd.DatetimeIndex from for which to apply cyclic encoding.
Must be an attribute of `pd.DatetimeIndex`, or `week` / `weekofyear` / `week_of_year` - e.g. "month",
"weekday", "day", "hour", "minute", "second". See all available attributes in
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.
For more information, check out :meth:`datetime_attribute_timeseries()
<darts.utils.timeseries_generation.datetime_attribute_timeseries>`
input_chunk_length
Optionally, the number of input target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `input_chunk_length` from :class:`TorchForecastingModel`, or to the absolute
minimum target lag value `abs(min(lags))` for :class:`RegressionModel`.
output_chunk_length
Optionally, the number of output target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `output_chunk_length` from both :class:`TorchForecastingModel`, and
:class:`RegressionModel`.
lags_covariates
Optionally, a list of integers representing the past covariate lags. Accepts integer lag values <= -1.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_past_covariates` of :class:`RegressionModel`.
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(
index_generator=PastCovariatesIndexGenerator(
input_chunk_length,
output_chunk_length,
lags_covariates=lags_covariates,
),
attribute=attribute,
tz=tz,
)
[docs]class FutureCyclicEncoder(CyclicTemporalEncoder):
"""`CyclicEncoder`: Cyclic encoding of future covariates datetime attributes."""
def __init__(
self,
attribute: str,
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[list[int]] = None,
tz: Optional[str] = None,
):
"""
Parameters
----------
attribute
The attribute of the underlying pd.DatetimeIndex from for which to apply cyclic encoding.
Must be an attribute of `pd.DatetimeIndex`, or `week` / `weekofyear` / `week_of_year` - e.g. "month",
"weekday", "day", "hour", "minute", "second". See all available attributes in
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.
For more information, check out :meth:`datetime_attribute_timeseries()
<darts.utils.timeseries_generation.datetime_attribute_timeseries>`
input_chunk_length
Optionally, the number of input target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `input_chunk_length` from :class:`TorchForecastingModel`, or to the absolute
minimum target lag value `abs(min(lags))` for :class:`RegressionModel`.
output_chunk_length
Optionally, the number of output target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `output_chunk_length` from both :class:`TorchForecastingModel`, and
:class:`RegressionModel`.
lags_covariates
Optionally, a list of integers representing the future covariate lags. Accepts all integer values.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_future_covariates` from :class:`RegressionModel`.
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(
index_generator=FutureCovariatesIndexGenerator(
input_chunk_length,
output_chunk_length,
lags_covariates=lags_covariates,
),
attribute=attribute,
tz=tz,
)
[docs]class DatetimeAttributeEncoder(SingleEncoder):
"""`DatetimeAttributeEncoder`: Adds pd.DatatimeIndex attribute information derived from the index as scalars.
Requires the underlying TimeSeries to have a pd.DatetimeIndex
"""
def __init__(
self,
index_generator: CovariatesIndexGenerator,
attribute: str,
tz: Optional[str] = None,
):
"""
Parameters
----------
index_generator
An instance of `CovariatesIndexGenerator` with methods `generate_train_idx()` and
`generate_inference_idx()`. Used to generate the index for encoders.
attribute
The attribute of the underlying pd.DatetimeIndex for which to add scalar information.
Must be an attribute of `pd.DatetimeIndex`, or `week` / `weekofyear` / `week_of_year` - e.g. "month",
"weekday", "day", "hour", "minute", "second". See all available attributes in
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.
For more information, check out :meth:`datetime_attribute_timeseries()
<darts.utils.timeseries_generation.datetime_attribute_timeseries>`
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(index_generator)
self.attribute = attribute
self.tz = tz
def _encode(
self, index: SupportedIndex, target_end: pd.Timestamp, dtype: np.dtype
) -> TimeSeries:
"""Encode `index` as a scalar."""
super()._encode(index, target_end, dtype)
return datetime_attribute_timeseries(
index,
attribute=self.attribute,
dtype=dtype,
with_columns=self.base_component_name + self.attribute,
tz=self.tz,
)
@property
def accept_transformer(self) -> list[bool]:
"""`DatetimeAttributeEncoder` accepts transformations"""
return [True]
@property
def requires_fit(self) -> bool:
return False
@property
def base_component_name(self) -> str:
return super().base_component_name + "_dta_"
@property
def encoding_n_components(self) -> int:
return 1
[docs]class PastDatetimeAttributeEncoder(DatetimeAttributeEncoder):
"""Datetime attribute encoder for past covariates."""
def __init__(
self,
attribute: str,
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[list[int]] = None,
tz: Optional[str] = None,
):
"""
Parameters
----------
attribute
The attribute of the underlying pd.DatetimeIndex for which to add scalar information.
Must be an attribute of `pd.DatetimeIndex`, or `week` / `weekofyear` / `week_of_year` - e.g. "month",
"weekday", "day", "hour", "minute", "second". See all available attributes in
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.
For more information, check out :meth:`datetime_attribute_timeseries()
<darts.utils.timeseries_generation.datetime_attribute_timeseries>`
input_chunk_length
Optionally, the number of input target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `input_chunk_length` from :class:`TorchForecastingModel`, or to the absolute
minimum target lag value `abs(min(lags))` for :class:`RegressionModel`.
output_chunk_length
Optionally, the number of output target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `output_chunk_length` from both :class:`TorchForecastingModel`, and
:class:`RegressionModel`.
lags_covariates
Optionally, a list of integers representing the past covariate lags. Accepts integer lag values <= -1.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_past_covariates` of :class:`RegressionModel`.
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(
index_generator=PastCovariatesIndexGenerator(
input_chunk_length,
output_chunk_length,
lags_covariates=lags_covariates,
),
attribute=attribute,
tz=tz,
)
[docs]class FutureDatetimeAttributeEncoder(DatetimeAttributeEncoder):
"""Datetime attribute encoder for future covariates."""
def __init__(
self,
attribute: str,
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[list[int]] = None,
tz: Optional[str] = None,
):
"""
Parameters
----------
attribute
The attribute of the underlying pd.DatetimeIndex for which to add scalar information.
Must be an attribute of `pd.DatetimeIndex`, or `week` / `weekofyear` / `week_of_year` - e.g. "month",
"weekday", "day", "hour", "minute", "second". See all available attributes in
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex.
For more information, check out :meth:`datetime_attribute_timeseries()
<darts.utils.timeseries_generation.datetime_attribute_timeseries>`
input_chunk_length
Optionally, the number of input target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `input_chunk_length` from :class:`TorchForecastingModel`, or to the absolute
minimum target lag value `abs(min(lags))` for :class:`RegressionModel`.
output_chunk_length
Optionally, the number of output target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `output_chunk_length` from both :class:`TorchForecastingModel`, and
:class:`RegressionModel`.
lags_covariates
Optionally, a list of integers representing the future covariate lags. Accepts all integer values.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_future_covariates` from :class:`RegressionModel`.
tz
Optionally, a time zone to convert the time index to before computing the attributes.
"""
super().__init__(
index_generator=FutureCovariatesIndexGenerator(
input_chunk_length,
output_chunk_length,
lags_covariates=lags_covariates,
),
attribute=attribute,
tz=tz,
)
[docs]class IntegerIndexEncoder(SingleEncoder):
"""IntegerIndexEncoder: Adds integer index value (position) derived from the underlying TimeSeries' time index
for past and future covariates.
"""
def __init__(self, index_generator: CovariatesIndexGenerator, attribute: str):
"""
Parameters
----------
index_generator
An instance of `CovariatesIndexGenerator` with methods `generate_train_idx()` and
`generate_inference_idx()`. Used to generate the index for encoders.
attribute
Currently only 'relative' is supported. The generated encoded values will range from (-inf, inf) and the
target series end time will be used as a reference to evaluate the relative index positions.
"""
raise_if_not(
isinstance(attribute, str) and attribute in INTEGER_INDEX_ATTRIBUTES,
f"Encountered invalid encoder argument `{attribute}` for encoder `position`. "
f'Attribute must be `"relative"`.',
logger,
)
super().__init__(index_generator)
self.attribute = attribute
def _encode(
self, index: SupportedIndex, target_end: pd.Timestamp, dtype: np.dtype
) -> TimeSeries:
"""Adds integer index value (position) to the provided `index`.
For attribute=='relative', the reference point/index is the prediction/forecast index of the target series.
"""
super()._encode(index, target_end, dtype)
idx_larger_end = (index <= target_end).sum()
freq = index.freq if isinstance(index, pd.DatetimeIndex) else index.step
if idx_larger_end:
idx_larger_end -= 1
if index[0] > target_end:
idx_diff = (
len(generate_index(start=target_end, end=index[0], freq=freq)) - 1
)
elif index[-1] < target_end:
idx_diff = (
-len(generate_index(start=index[-1], end=target_end, freq=freq)) + 1
)
else:
idx_diff = 0
return TimeSeries.from_times_and_values(
times=index,
values=np.arange(
start=idx_diff - idx_larger_end,
stop=idx_diff - idx_larger_end + len(index),
),
columns=[self.base_component_name + self.attribute],
).astype(np.dtype(dtype))
@property
def accept_transformer(self) -> list[bool]:
"""`IntegerIndexEncoder` accepts transformations. Note that transforming 'relative' `IntegerIndexEncoder`
will return the absolute position (in the transformed space)."""
return [True]
@property
def requires_fit(self) -> bool:
# requires fitting to get the reference index from `IntegerIndexEncoder.index_generator` for inference
return True
@property
def base_component_name(self) -> str:
return super().base_component_name + "_pos_"
@property
def encoding_n_components(self) -> int:
return 1
[docs]class PastIntegerIndexEncoder(IntegerIndexEncoder):
"""`IntegerIndexEncoder`: Adds integer index value (position) for past covariates derived from the underlying
TimeSeries' time index.
"""
def __init__(
self,
attribute: str,
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[list[int]] = None,
**kwargs,
):
"""
Parameters
----------
attribute
Currently only 'relative' is supported. The generated encoded values will range from (-inf, inf) and the
target series end time will be used as a reference to evaluate the relative index positions.
input_chunk_length
Optionally, the number of input target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `input_chunk_length` from :class:`TorchForecastingModel`, or to the absolute
minimum target lag value `abs(min(lags))` for :class:`RegressionModel`.
output_chunk_length
Optionally, the number of output target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `output_chunk_length` from both :class:`TorchForecastingModel`, and
:class:`RegressionModel`.
lags_covariates
Optionally, a list of integers representing the past covariate lags. Accepts integer lag values <= -1.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_past_covariates` of :class:`RegressionModel`.
"""
super().__init__(
index_generator=PastCovariatesIndexGenerator(
input_chunk_length,
output_chunk_length,
lags_covariates=lags_covariates,
),
attribute=attribute,
)
[docs]class FutureIntegerIndexEncoder(IntegerIndexEncoder):
"""`IntegerIndexEncoder`: Adds integer index value (position) for future covariates derived from the underlying
TimeSeries' time index.
"""
def __init__(
self,
attribute: str,
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[list[int]] = None,
**kwargs,
):
"""
Parameters
----------
attribute
Currently only 'relative' is supported. The generated encoded values will range from (-inf, inf) and the
target series end time will be used as a reference to evaluate the relative index positions.
input_chunk_length
Optionally, the number of input target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `input_chunk_length` from :class:`TorchForecastingModel`, or to the absolute
minimum target lag value `abs(min(lags))` for :class:`RegressionModel`.
output_chunk_length
Optionally, the number of output target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `output_chunk_length` from both :class:`TorchForecastingModel`, and
:class:`RegressionModel`.
lags_covariates
Optionally, a list of integers representing the future covariate lags. Accepts all integer values.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_future_covariates` from :class:`RegressionModel`.
"""
super().__init__(
index_generator=FutureCovariatesIndexGenerator(
input_chunk_length,
output_chunk_length,
lags_covariates=lags_covariates,
),
attribute=attribute,
)
[docs]class CallableIndexEncoder(SingleEncoder):
"""`CallableIndexEncoder`: Applies a user-defined callable to encode the underlying index for past and future
covariates.
"""
def __init__(self, index_generator: CovariatesIndexGenerator, attribute: Callable):
"""
Parameters
----------
index_generator
An instance of `CovariatesIndexGenerator` with methods `generate_train_idx()` and
`generate_inference_idx()`. Used to generate the index for encoders.
attribute
A callable that takes an index `index` of type `(pd.DatetimeIndex, pd.RangeIndex)` as input
and returns a np.ndarray of shape `(len(index),)`.
An example for a correct `attribute` for `index` of type pd.DatetimeIndex:
``attribute = lambda index: (index.year - 1950) / 50``. And for pd.RangeIndex:
``attribute = lambda index: (index - 1950) / 50``
"""
raise_if_not(
callable(attribute),
f"Encountered invalid encoder argument `{attribute}` for encoder `callable`. "
f"Attribute must be a callable that returns a `np.ndarray`.",
logger,
)
super().__init__(index_generator)
self.attribute = attribute
def _encode(
self, index: SupportedIndex, target_end: pd.Timestamp, dtype: np.dtype
) -> TimeSeries:
"""Apply the user-defined callable to encode the index"""
super()._encode(index, target_end, dtype)
return TimeSeries.from_times_and_values(
times=index,
values=self.attribute(index),
columns=[self.base_component_name + "custom"],
).astype(np.dtype(dtype))
@property
def accept_transformer(self) -> list[bool]:
"""`CallableIndexEncoder` accepts transformations."""
return [True]
@property
def requires_fit(self) -> bool:
return False
@property
def base_component_name(self) -> str:
return super().base_component_name + "_cus_"
@property
def encoding_n_components(self) -> int:
return 1
[docs]class PastCallableIndexEncoder(CallableIndexEncoder):
"""`IntegerIndexEncoder`: Adds integer index value (position) for past covariates derived from the underlying
TimeSeries' time index.
"""
def __init__(
self,
attribute: Callable,
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[list[int]] = None,
**kwargs,
):
"""
Parameters
----------
attribute
A callable that takes an index `index` of type `(pd.DatetimeIndex, pd.RangeIndex)` as input
and returns a np.ndarray of shape `(len(index),)`.
An example for a correct `attribute` for `index` of type pd.DatetimeIndex:
``attribute = lambda index: (index.year - 1950) / 50``. And for pd.RangeIndex:
``attribute = lambda index: (index - 1950) / 50``
input_chunk_length
Optionally, the number of input target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `input_chunk_length` from :class:`TorchForecastingModel`, or to the absolute
minimum target lag value `abs(min(lags))` for :class:`RegressionModel`.
output_chunk_length
Optionally, the number of output target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `output_chunk_length` from both :class:`TorchForecastingModel`, and
:class:`RegressionModel`.
lags_covariates
Optionally, a list of integers representing the past covariate lags. Accepts integer lag values <= -1.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_past_covariates` of :class:`RegressionModel`.
"""
super().__init__(
index_generator=PastCovariatesIndexGenerator(
input_chunk_length,
output_chunk_length,
lags_covariates=lags_covariates,
),
attribute=attribute,
)
[docs]class FutureCallableIndexEncoder(CallableIndexEncoder):
"""`IntegerIndexEncoder`: Adds integer index value (position) for future covariates derived from the underlying
TimeSeries' time index.
"""
def __init__(
self,
attribute: Callable,
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_covariates: Optional[list[int]] = None,
**kwargs,
):
"""
Parameters
----------
attribute
A callable that takes an index `index` of type `(pd.DatetimeIndex, pd.RangeIndex)` as input
and returns a np.ndarray of shape `(len(index),)`.
An example for a correct `attribute` for `index` of type pd.DatetimeIndex:
``attribute = lambda index: (index.year - 1950) / 50``. And for pd.RangeIndex:
``attribute = lambda index: (index - 1950) / 50``
input_chunk_length
Optionally, the number of input target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `input_chunk_length` from :class:`TorchForecastingModel`, or to the absolute
minimum target lag value `abs(min(lags))` for :class:`RegressionModel`.
output_chunk_length
Optionally, the number of output target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `output_chunk_length` from both :class:`TorchForecastingModel`, and
:class:`RegressionModel`.
lags_covariates
Optionally, a list of integers representing the future covariate lags. Accepts all integer values.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_future_covariates` from :class:`RegressionModel`.
"""
super().__init__(
index_generator=FutureCovariatesIndexGenerator(
input_chunk_length,
output_chunk_length,
lags_covariates=lags_covariates,
),
attribute=attribute,
)
[docs]class SequentialEncoder(Encoder):
"""A `SequentialEncoder` object can store and control multiple past and future covariates encoders at once.
It provides the same functionality as single encoders (`encode_train()`, `encode_inference()`,
`encode_train_inference()`).
"""
def __init__(
self,
add_encoders: dict,
input_chunk_length: Optional[int] = None,
output_chunk_length: Optional[int] = None,
lags_past_covariates: Optional[list[int]] = None,
lags_future_covariates: Optional[list[int]] = None,
takes_past_covariates: bool = False,
takes_future_covariates: bool = False,
) -> None:
"""
SequentialEncoder automatically creates encoder objects from parameter `add_encoders`. `add_encoders` can also
be set directly in all of Darts' `ForecastingModels`. This will automatically set up a
:class:`SequentialEncoder` tailored to the settings of the underlying forecasting model.
The `add_encoders` dict must follow this convention:
`{encoder keyword: {temporal keyword: List[attributes]}, ..., transformer keyword: transformer object}`
Supported encoder keywords:
`'cyclic'` for cyclic temporal encoder. See the docs
:meth:`CyclicTemporalEncoder <darts.dataprocessing.encoders.CyclicTemporalEncoder>`;
`'datetime_attribute'` for adding scalar information of pd.DatetimeIndex attribute. See the docs
:meth:`DatetimeAttributeEncoder <darts.dataprocessing.encoders.DatetimeAttributeEncoder>`
`'position'` for integer index position encoder. See the docs
:meth:`IntegerIndexEncoder <darts.dataprocessing.encoders.IntegerIndexEncoder>`;
`'custom'` for encoding index with custom callables (functions). See the docs
:meth:`CallableIndexEncoder <darts.dataprocessing.encoders.CallableIndexEncoder>`;
Supported temporal keywords:
'past' for adding encoding as past covariates
'future' for adding encoding as future covariates
Supported attributes:
for attributes read the referred docs for the corresponding encoder from above
Supported transformers:
a transformer can be added with transformer keyword 'transformer'. The transformer object must be an
instance of Darts' :meth:`FittableDataTransformer
<darts.dataprocessing.transformers.fittable_data_transformer.FittableDataTransformer>` such as Scaler() or
BoxCox(). The transformers will be fitted on the training dataset when calling calling `model.fit()`.
The training, validation and inference datasets are then transformed equally.
Supported time zone:
Optionally, apply a time zone conversion with keyword 'tz'. This converts the time zone-naive index to a
timezone `'tz'` before applying the `'cyclic'` or `'datetime_attribute'` temporal encoders.
An example of a valid `add_encoders` dict for hourly data:
.. highlight:: python
.. code-block:: python
from darts.dataprocessing.transformers import Scaler
add_encoders={
'cyclic': {'future': ['month']},
'datetime_attribute': {'past': ['hour'], 'future': ['year', 'dayofweek']},
'position': {'past': ['relative'], 'future': ['relative']},
'custom': {'past': [lambda idx: (idx.year - 1950) / 50]},
'transformer': Scaler(),
'tz': 'CET',
}
Tuples of `(encoder_id, attribute)` are extracted from `add_encoders` to instantiate the `SingleEncoder`
objects:
* The `encoder_id` is extracted as follows:
str(encoder_kw) + str(temporal_kw) -> 'cyclic' + 'past' -> `encoder_id` = 'cyclic_past'
The `encoder_id` is used to map the parameters with the corresponding `SingleEncoder` objects.
* The `attribute` is extracted from the values given by values under `temporal_kw`
`attribute` = 'month'
...
The `attribute` tells the `SingleEncoder` which attribute of the index to encode
New encoders can be added by appending them to the mapping property `SequentialEncoder.encoder_map()`
Parameters
----------
add_encoders
A dictionary with the encoder settings.
input_chunk_length
Optionally, the number of input target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `input_chunk_length` from :class:`TorchForecastingModel`, or to the absolute
minimum target lag value `abs(min(lags))` for :class:`RegressionModel`.
output_chunk_length
Optionally, the number of output target time steps per chunk. Only required for
:class:`TorchForecastingModel`, and :class:`RegressionModel`.
Corresponds to parameter `output_chunk_length` from both :class:`TorchForecastingModel`, and
:class:`RegressionModel`.
lags_past_covariates
Optionally, a list of integers representing the past covariate lags. Accepts integer lag values <= -1.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_past_covariates` of :class:`RegressionModel`.
lags_future_covariates
Optionally, a list of integers representing the future covariate lags. Accepts all integer values.
Only required for :class:`RegressionModel`.
Corresponds to the lag values from parameter `lags_future_covariates` from :class:`RegressionModel`.
takes_past_covariates
Whether to encode/generate past covariates.
takes_future_covariates
Whether to encode/generate future covariates.
"""
super().__init__()
self.params = add_encoders
self.input_chunk_length = input_chunk_length
self.output_chunk_length = output_chunk_length
self.encoding_available = False
self.takes_past_covariates = takes_past_covariates
self.takes_future_covariates = takes_future_covariates
self.lags_past_covariates = lags_past_covariates
self.lags_future_covariates = lags_future_covariates
# encoders
self._past_encoders: list[SingleEncoder] = []
self._past_components: pd.Index = pd.Index([])
self._future_encoders: list[SingleEncoder] = []
self._future_components: pd.Index = pd.Index([])
# transformer
self._past_transformer: Optional[SequentialEncoderTransformer] = None
self._future_transformer: Optional[SequentialEncoderTransformer] = None
# setup encoders and transformer
self._setup_encoders(self.params)
self._setup_transformer(self.params)
[docs] def encode_train(
self,
target: SupportedTimeSeries,
past_covariates: Optional[SupportedTimeSeries] = None,
future_covariates: Optional[SupportedTimeSeries] = None,
encode_past: bool = True,
encode_future: bool = True,
) -> tuple[
Union[TimeSeries, Sequence[TimeSeries]], Union[TimeSeries, Sequence[TimeSeries]]
]:
"""Returns encoded index for all past and/or future covariates for training.
Which covariates are generated depends on the parameters used at model creation.
Parameters
----------
target
The target TimeSeries used during training or passed to prediction as `series`.
past_covariates
Optionally, the past covariates used for training.
future_covariates
Optionally, the future covariates used for training.
encode_past
Whether to apply encoders for past covariates.
encode_future
Whether to apply encoders for future covariates.
Returns
-------
Tuple[past_covariates, future_covariates]
The past_covariates and/or future_covariates for training including the encodings.
If input {x}_covariates is None and no {x}_encoders are given, will return `None`
for the {x}_covariates.
Raises
------
Warning
If model was created with `add_encoders` and there is suspicion of lazy loading.
The encodings/covariates are generated eagerly before starting training for all individual targets and
loaded into memory. Depending on the size of target data, this can create memory
issues. In case this applies, consider setting `add_encoders=None` at model
creation and build your encodings covariates manually for lazy loading.
"""
if not self.fit_called:
if not isinstance(target, (TimeSeries, list)):
logger.warning(
"Fitting was called with `add_encoders` and suspicion of lazy loading. "
"The encodings/covariates are generated pre-train for all individual targets and "
"loaded into memory. Depending on the size of your data, this can create memory issues. "
"In case this applies, consider setting `add_encoders=None` at model creation."
)
self._fit_called = True
past_covariates, future_covariates = self._launch_encoder(
target=target,
past_covariates=past_covariates,
future_covariates=future_covariates,
encoder_method=_EncoderMethod("train"),
n=None,
encode_past=encode_past,
encode_future=encode_future,
)
self._fit_called = True
return past_covariates, future_covariates
[docs] def encode_inference(
self,
n: int,
target: SupportedTimeSeries,
past_covariates: Optional[SupportedTimeSeries] = None,
future_covariates: Optional[SupportedTimeSeries] = None,
encode_past: bool = True,
encode_future: bool = True,
) -> tuple[
Union[TimeSeries, Sequence[TimeSeries]], Union[TimeSeries, Sequence[TimeSeries]]
]:
"""Returns encoded index for all past and/or future covariates for inference/prediction.
Which covariates are generated depends on the parameters used at model creation.
Parameters
----------
n
The forecast horizon
target
The target TimeSeries used during training or passed to prediction as `series`.
past_covariates
Optionally, the past covariates used for training.
future_covariates
Optionally, the future covariates used for training.
encode_past
Whether to apply encoders for past covariates.
encode_future
Whether to apply encoders for future covariates.
Returns
-------
Tuple[past_covariates, future_covariates]
The past_covariates and/or future_covariates for prediction/inference including the encodings.
If input {x}_covariates is None and no {x}_encoders are given, will return `None`
for the {x}_covariates.
"""
raise_if(
not self.fit_called and self.requires_fit,
f"`{self.__class__.__name__}` contains encoders or transformers which must be trained before inference. "
"Call method `encode_train()` before `encode_inference()`.",
logger=logger,
)
return self._launch_encoder(
target=target,
past_covariates=past_covariates,
future_covariates=future_covariates,
encoder_method=_EncoderMethod("inference"),
n=n,
encode_past=encode_past,
encode_future=encode_future,
)
[docs] def encode_train_inference(
self,
n: int,
target: SupportedTimeSeries,
past_covariates: Optional[SupportedTimeSeries] = None,
future_covariates: Optional[SupportedTimeSeries] = None,
encode_past: bool = True,
encode_future: bool = True,
) -> tuple[
Union[TimeSeries, Sequence[TimeSeries]], Union[TimeSeries, Sequence[TimeSeries]]
]:
"""Returns encoded index for all past and/or future covariates for training and inference/prediction.
Which covariates are generated depends on the parameters used at model creation.
Parameters
----------
n
The forecast horizon
target
The target TimeSeries used for training and prediction.
past_covariates
Optionally, the past covariates used for training and prediction.
future_covariates
Optionally, the future covariates used for training and prediction.
encode_past
Whether to apply encoders for past covariates.
encode_future
Whether to apply encoders for future covariates.
Returns
-------
Tuple[past_covariates, future_covariates]
The past_covariates and/or future_covariates for prediction/inference including the encodings.
If input {x}_covariates is None and no {x}_encoders are given, will return `None`
for the {x}_covariates.
"""
if not self.fit_called:
if not isinstance(target, (TimeSeries, list)):
logger.warning(
"Fitting was called with `add_encoders` and suspicion of lazy loading. "
"The encodings/covariates are generated pre-train for all individual targets and "
"loaded into memory. Depending on the size of your data, this can create memory issues. "
"In case this applies, consider setting `add_encoders=None` at model creation."
)
self._fit_called = True
past_covariates, future_covariates = self._launch_encoder(
target=target,
past_covariates=past_covariates,
future_covariates=future_covariates,
encoder_method=_EncoderMethod("train_inference"),
n=n,
encode_past=encode_past,
encode_future=encode_future,
)
self._fit_called = True
return past_covariates, future_covariates
def _launch_encoder(
self,
target: Sequence[TimeSeries],
past_covariates: SupportedTimeSeries,
future_covariates: SupportedTimeSeries,
encoder_method: _EncoderMethod,
n: Optional[int] = None,
encode_past: bool = True,
encode_future: bool = True,
) -> tuple[Sequence[TimeSeries], Sequence[TimeSeries]]:
"""Launches the encode sequence for past covariates and future covariates for either training,
inference/prediction or training and inference/prediction depending on `encoder_method`.
"""
if not self.encoding_available:
return past_covariates, future_covariates
# guarantee that all inputs are either a sequence of TimeSeries or None
single_series = isinstance(target, TimeSeries)
target = series2seq(target)
past_covariates = series2seq(past_covariates)
future_covariates = series2seq(future_covariates)
# generate past covariates encodings
if self.past_encoders and encode_past:
past_covariates = self._encode_sequence(
encoders=self.past_encoders,
transformer=self.past_transformer,
target=target,
covariates=past_covariates,
covariates_type=PAST,
encoder_method=encoder_method,
n=n,
)
# generate future covariates encodings
if self.future_encoders and encode_future:
future_covariates = self._encode_sequence(
encoders=self.future_encoders,
transformer=self.future_transformer,
target=target,
covariates=future_covariates,
covariates_type=FUTURE,
encoder_method=encoder_method,
n=n,
)
# convert covariates back to single series if single target was used as input
if single_series:
past_covariates = seq2series(past_covariates)
future_covariates = seq2series(future_covariates)
return past_covariates, future_covariates
def _encode_sequence(
self,
encoders: Sequence[SingleEncoder],
transformer: Optional[SequentialEncoderTransformer],
target: Sequence[TimeSeries],
covariates: Optional[SupportedTimeSeries],
covariates_type: str,
encoder_method: _EncoderMethod,
n: Optional[int] = None,
) -> list[TimeSeries]:
"""Sequentially encodes the index of all input target/covariates TimeSeries with the corresponding
`encoder_method`.
"""
encode_method = encoder_method.method
encoded_sequence = []
if covariates is None:
covariates = [None] * len(target)
else:
covariates = (
[covariates] if isinstance(covariates, TimeSeries) else covariates
)
for ts, covs in zip(target, covariates):
# drop encoder components if they are in input covariates
covs = self._drop_encoded_components(
covariates=covs,
components=getattr(self, f"{covariates_type}_components"),
)
encoded = concatenate(
[
getattr(enc, encode_method)(
target=ts, covariates=covs, merge_covariates=False, n=n
)
for enc in encoders
],
axis=DIMS[1],
)
encoded_sequence.append(
self._merge_covariates(encoded=encoded, covariates=covs)
)
if transformer is not None:
encoded_sequence = transformer.transform(encoded_sequence)
# store encoded past/future component names if they were not saved before
if getattr(self, f"{covariates_type}_components").empty:
components = encoded_sequence[0].components
if covariates is not None and covariates[0] is not None:
components = components[~components.isin(covariates[0].components)]
setattr(self, f"_{covariates_type}_components", components)
return encoded_sequence
@property
def past_encoders(self) -> list[SingleEncoder]:
"""Returns the past covariates encoders"""
return self._past_encoders
@property
def future_encoders(self) -> list[SingleEncoder]:
"""Returns the future covariates encoders"""
return self._future_encoders
@property
def encoders(self) -> tuple[list[SingleEncoder], list[SingleEncoder]]:
"""Returns a tuple of (past covariates encoders, future covariates encoders)"""
return self.past_encoders, self.future_encoders
@property
def past_components(self) -> pd.Index:
"""Returns the past covariates component names generated by `SequentialEncoder.past_encoders`.
Only available after calling `SequentialEncoder.encode_train()`
"""
return self._past_components
@property
def future_components(self) -> pd.Index:
"""Returns the future covariates component names generated by `SequentialEncoder.future_encoders`.
Only available after calling `SequentialEncoder.encode_train()`
"""
return self._future_components
@property
def components(self) -> tuple[pd.Index, pd.Index]:
"""Returns the covariates component names generated by `SequentialEncoder.past_encoders` and
`SequentialEncoder.future_encoders`. A tuple of (past encoded components, future encoded components).
Only available after calling `SequentialEncoder.encode_train()`
"""
return self.past_components, self.future_components
@property
def encoding_n_components(self) -> tuple[int, int]:
"""Returns the number of components generated by `SequentialEncoder.past_encoders` and
`SequentialEncoder.future_encoders`.
"""
# by default, _[past/future]_encoders are empty lists
past_enc_n_compoments = sum(
past_enc.encoding_n_components for past_enc in self.past_encoders
)
future_enc_n_compoments = sum(
future_enc.encoding_n_components for future_enc in self.future_encoders
)
return past_enc_n_compoments, future_enc_n_compoments
@property
def past_transformer(self) -> SequentialEncoderTransformer:
"""Returns the past transformer object"""
return self._past_transformer
@property
def future_transformer(self) -> SequentialEncoderTransformer:
"""Returns the future transformer object"""
return self._future_transformer
@property
def encoder_map(self) -> dict:
"""Mapping between encoder identifier string (from parameters at model creations) and the corresponding
future or past covariates encoder"""
mapper = {
"cyclic_past": PastCyclicEncoder,
"cyclic_future": FutureCyclicEncoder,
"datetime_attribute_past": PastDatetimeAttributeEncoder,
"datetime_attribute_future": FutureDatetimeAttributeEncoder,
"position_past": PastIntegerIndexEncoder,
"position_future": FutureIntegerIndexEncoder,
"custom_past": PastCallableIndexEncoder,
"custom_future": FutureCallableIndexEncoder,
}
return mapper
def _setup_encoders(self, params: dict) -> None:
"""Sets up/Initializes all past and future encoders and an optional transformer from `add_encoder` parameter
used at model creation.
Parameters
----------
params
Dict from parameter `add_encoders` (kwargs) used at model creation. Relevant parameters are:
* params={'cyclic': {'past': ['month', 'dayofweek', ...], 'future': [same as for 'past']}}
"""
past_encoders, future_encoders = self._process_input_encoders(params)
tz = self._process_timezone(params)
if not past_encoders and not future_encoders:
return
self._past_encoders = [
self.encoder_map[enc_id](
attribute=attr,
input_chunk_length=self.input_chunk_length,
output_chunk_length=self.output_chunk_length,
lags_covariates=self.lags_past_covariates,
tz=tz,
)
for enc_id, attr in past_encoders
]
self._future_encoders = [
self.encoder_map[enc_id](
attribute=attr,
input_chunk_length=self.input_chunk_length,
output_chunk_length=self.output_chunk_length,
lags_covariates=self.lags_future_covariates,
tz=tz,
)
for enc_id, attr in future_encoders
]
self.encoding_available = True
def _setup_transformer(self, params: dict) -> None:
"""Sets up/Initializes an optional transformer from `add_encoder` parameter used at model creation.
Parameters
----------
params
Dict from parameter `add_encoders` (kwargs) used at model creation. Relevant parameters are:
* params={..., 'transformer': Scaler()}
"""
(
transformer,
transform_past_mask,
transform_future_mask,
) = self._process_input_transformer(params)
if transform_past_mask:
self._past_transformer = SequentialEncoderTransformer(
copy.deepcopy(transformer), transform_past_mask
)
if transform_future_mask:
self._future_transformer = SequentialEncoderTransformer(
copy.deepcopy(transformer), transform_future_mask
)
def _process_input_encoders(self, params: dict) -> tuple[list, list]:
"""Processes input and returns two lists of tuples `(encoder_id, attribute)` from relevant encoder
parameters at model creation.
Parameters
----------
params
The `add_encoders` dict used at model creation. Must follow this convention:
`{encoder keyword: {temporal keyword: List[attributes]}}`
Tuples of `(encoder_id, attribute)` are extracted from `add_encoders` to instantiate the `SingleEncoder`
objects:
* The `encoder_id` is extracted as follows:
str(encoder_kw) + str(temporal_kw) -> 'cyclic' + 'past' -> `encoder_id` = 'cyclic_past'
The `encoder_id` is used to map the parameters with the corresponding `SingleEncoder` objects.
* The `attribute` is extracted from the values given by values under `temporal_kw`
`attribute` = 'month'
...
The `attribute` tells the `SingleEncoder` which attribute of the index to encode
Raises
------
ValueError
1) if the outermost key is other than (`past`, `future`)
2) if the innermost values are other than type `str` or `Sequence`
3) if any of entry in the innermost values is a lambda function
"""
if not params:
return [], []
# check input for invalid encoder types
invalid_encoders = [
enc
for enc in params
if enc not in ENCODER_KEYS + TZ_KEYS + TRANSFORMER_KEYS
]
raise_if(
len(invalid_encoders) > 0,
f"Encountered invalid encoder types `{invalid_encoders}` in `add_encoders` parameter at model "
f"creation. Supported encoder types are: `{ENCODER_KEYS + TRANSFORMER_KEYS}`.",
logger,
)
encoders = {enc: params[enc] for enc in ENCODER_KEYS if params.get(enc, None)}
# check input for invalid temporal types
invalid_time_params = list()
for encoder, t_types in encoders.items():
invalid_time_params += [
t_type for t_type in t_types.keys() if t_type not in VALID_TIME_PARAMS
]
raise_if(
len(invalid_time_params) > 0,
f"Encountered invalid temporal types `{invalid_time_params}` in `add_encoders` parameter at model "
f"creation. Supported temporal types are: `{VALID_TIME_PARAMS}`.",
logger,
)
# check that encoders are not lambda functions (not pickable)
lambda_func_encoders = set()
# convert into tuples of (encoder string identifier, encoder attribute)
past_encoders, future_encoders = list(), list()
for enc, enc_params in encoders.items():
for enc_time, enc_attr in enc_params.items():
raise_if_not(
isinstance(enc_attr, VALID_ENCODER_DTYPES),
f"Encountered value `{enc_attr}` of invalid type `{type(enc_attr)}` for encoder "
f"`{enc}` in `add_encoders` at model creation. Supported data types are: "
f"`{VALID_ENCODER_DTYPES}`.",
logger,
)
attrs = [enc_attr] if isinstance(enc_attr, str) else enc_attr
for attr in attrs:
encoder_id = "_".join([enc, enc_time])
if enc_time == PAST:
past_encoders.append((encoder_id, attr))
else:
future_encoders.append((encoder_id, attr))
if isinstance(attr, Callable) and attr.__name__ == "<lambda>":
lambda_func_encoders.add(enc)
raise_if(
len(lambda_func_encoders) > 0,
f"Encountered lambda function in the following `add_encoders` entries : {lambda_func_encoders} "
f"at model creation. "
f"In order to prevent issues when saving the model, these encoders must be converted to "
f"named functions.",
logger,
)
for temp_enc, takes_temp, temp in [
(past_encoders, self.takes_past_covariates, "past"),
(future_encoders, self.takes_future_covariates, "future"),
]:
if temp_enc and not takes_temp:
logger.warning(
f"Specified {temp} encoders in `add_encoders` at model creation but model does not "
f"accept {temp} covariates. {temp} encoders will be ignored."
)
past_encoders = past_encoders if self.takes_past_covariates else []
future_encoders = future_encoders if self.takes_future_covariates else []
return past_encoders, future_encoders
def _process_input_transformer(
self, params: dict
) -> tuple[Optional[FittableDataTransformer], list, list]:
"""Processes input params used at model creation and returns tuple of one transformer object and two masks
that specify which past / future encoders accept being transformed.
Parameters
----------
params
Dict from parameter `add_encoders` (kwargs) used at model creation. Relevant parameters are:
* params={'transformer': Scaler()}
"""
if not params:
return None, [], []
transformer = params.get(TRANSFORMER_KEYS[0], None)
if transformer is None:
return None, [], []
raise_if_not(
isinstance(transformer, VALID_TRANSFORMER_DTYPES),
f"Encountered `{TRANSFORMER_KEYS[0]}` of invalid type `{type(transformer)}` "
f"in `add_encoders` at model creation. Transformer must be an instance of "
f"`{VALID_TRANSFORMER_DTYPES}`.",
logger,
)
transform_past_mask = [
transform
for enc in self.past_encoders
for transform in enc.accept_transformer
]
transform_future_mask = [
transform
for enc in self.future_encoders
for transform in enc.accept_transformer
]
return transformer, transform_past_mask, transform_future_mask
@staticmethod
def _process_timezone(params: dict) -> Optional[str]:
"""Processes input params used at model creation for time zone specification, and returns the time zone.
Parameters
----------
params
Dict from parameter `add_encoders` (kwargs) used at model creation. Relevant parameters are:
* params={'tz': 'CET'}
"""
if not params:
return None
return params.get(TZ_KEYS[0], None)
@property
def requires_fit(self) -> bool:
return any([
enc.requires_fit for cov_enc in self.encoders for enc in cov_enc
]) or any([tf is not None for tf in self.transformers()])