Scaler

class darts.dataprocessing.transformers.scaler.Scaler(scaler=None, name='Scaler', n_jobs=1, verbose=False)[source]

Bases: darts.dataprocessing.transformers.invertible_data_transformer.InvertibleDataTransformer, darts.dataprocessing.transformers.fittable_data_transformer.FittableDataTransformer

Generic wrapper class for using scalers on time series.

The underlying scaler has to implement the fit(), transform() and inverse_transform() methods (typically from scikit-learn).

When the scaler is applied on multivariate series, the scaling is done per-component. When the series are stochastic, the scaling is done across all samples (for each given component). The transformation is applied independently for each dimension (component) of the time series, effectively merging all samples of a component in order to compute the transform.

Notes

The scaler will not scale the series’ static covariates. This has to be done either before constructing the series, or later on by extracting the covariates, transforming the values and then reapplying them to the series. For this, see TimeSeries properties TimeSeries.static_covariates and method TimeSeries.with_static_covariates()

Parameters
  • scaler – The scaler to transform the data with. It must provide fit(), transform() and inverse_transform() methods. Default: sklearn.preprocessing.MinMaxScaler(feature_range=(0, 1)); this will scale all the values of a time series between 0 and 1.

  • name – A specific name for the scaler

  • n_jobs (int) – The number of jobs to run in parallel. Parallel jobs are created only when a Sequence[TimeSeries] is passed as input to a method, parallelising operations regarding different TimeSeries. Defaults to 1 (sequential). Setting the parameter to -1 means using all the available processors. Note: for a small amount of data, the parallelisation overhead could end up increasing the total required amount of time.

  • verbose (bool) – Optionally, whether to print operations progress

Notes

In case the Scaler is applied to multiple TimeSeries objects, a deep-copy of the chosen scaler will be instantiated, fitted, and stored, for each TimeSeries.

Examples

>>> from darts.datasets import AirPassengersDataset
>>> from sklearn.preprocessing import MinMaxScaler
>>> from darts.dataprocessing.transformers import Scaler
>>> series = AirPassengersDataset().load()
>>> scaler = MinMaxScaler(feature_range=(-1, 1))
>>> transformer = Scaler(scaler)
>>> series_transformed = transformer.fit_transform(series)
>>> print(min(series_transformed.values()))
[-1.]
>>> print(max(series_transformed.values()))
[2.]

Attributes

name

Name of the data transformer.

Methods

fit(series, *args, **kwargs)

Fit the transformer to the provided series or sequence of series.

fit_transform(series, *args, **kwargs)

Fit the transformer to the (sequence of) series and return the transformed input.

inverse_transform(series, *args, **kwargs)

Inverse-transform a (sequence of) series.

set_n_jobs(value)

Set the number of processors to be used by the transformer while processing multiple TimeSeries.

set_verbose(value)

Set the verbosity status.

transform(series, *args, **kwargs)

Transform a (sequence of) of series.

ts_fit(series, transformer, *args, **kwargs)

The function that will be applied to each series when fit() is called.

ts_inverse_transform(series, transformer, ...)

The function that will be applied to each series when inverse_transform() is called.

ts_transform(series, transformer, **kwargs)

The function that will be applied to each series when transform() is called.

fit(series, *args, **kwargs)

Fit the transformer to the provided series or sequence of series.

Fit the data and store the fitting parameters into self._fitted_params. If a sequence is passed as input data, this function takes care of parallelising the fitting of multiple series in the sequence at the same time (in this case self._fitted_params will contain an array of fitted params, one for each series).

Parameters
  • series (Union[TimeSeries, Sequence[TimeSeries]]) – (sequence of) series to fit the transformer on.

  • args – Additional positional arguments for the ts_fit() method

  • kwargs

    Additional keyword arguments for the ts_fit() method

    component_maskOptional[np.ndarray] = None

    Optionally, a 1-D boolean np.ndarray of length series.n_components that specifies which components of the underlying series the Scaler should consider.

Returns

Fitted transformer.

Return type

FittableDataTransformer

fit_transform(series, *args, **kwargs)

Fit the transformer to the (sequence of) series and return the transformed input.

Parameters
  • series (Union[TimeSeries, Sequence[TimeSeries]]) – the (sequence of) series to transform.

  • args – Additional positional arguments for the ts_transform() method

  • kwargs

    Additional keyword arguments for the ts_transform() method:

    component_maskOptional[np.ndarray] = None

    Optionally, a 1-D boolean np.ndarray of length series.n_components that specifies which components of the underlying series the Scaler should consider.

Returns

Transformed data.

Return type

Union[TimeSeries, Sequence[TimeSeries]]

inverse_transform(series, *args, **kwargs)

Inverse-transform a (sequence of) series.

In case a sequence is passed as input data, this function takes care of parallelising the transformation of multiple series in the sequence at the same time.

Parameters
  • series (Union[TimeSeries, Sequence[TimeSeries]]) – the (sequence of) series be inverse-transformed.

  • args – Additional positional arguments for the ts_inverse_transform() method

  • kwargs

    Additional keyword arguments for the ts_inverse_transform() method

    component_maskOptional[np.ndarray] = None

    Optionally, a 1-D boolean np.ndarray of length series.n_components that specifies which components of the underlying series the Scaler should consider.

Returns

Inverse transformed data.

Return type

Union[TimeSeries, List[TimeSeries]]

property name

Name of the data transformer.

set_n_jobs(value)

Set the number of processors to be used by the transformer while processing multiple TimeSeries.

Parameters

value (int) – New n_jobs value. Set to -1 for using all the available cores.

set_verbose(value)

Set the verbosity status.

True for enabling the detailed report about scaler’s operation progress, False for no additional information.

Parameters

value (bool) – New verbosity status

transform(series, *args, **kwargs)

Transform a (sequence of) of series.

In case a Sequence is passed as input data, this function takes care of parallelising the transformation of multiple series in the sequence at the same time.

Parameters
  • series (Union[TimeSeries, Sequence[TimeSeries]]) – (sequence of) series to be transformed.

  • args – Additional positional arguments for each ts_transform() method call

  • kwargs – Additional keyword arguments for each ts_transform() method call

Returns

Transformed data.

Return type

Union[TimeSeries, List[TimeSeries]]

static ts_fit(series, transformer, *args, **kwargs)[source]

The function that will be applied to each series when fit() is called.

The function must take as first argument a TimeSeries object, and return an object containing information regarding the fitting phase (e.g., parameters, or external transformers objects). All these parameters will be stored in self._fitted_params, which can be later used during the transformation step.

This method is not implemented in the base class and must be implemented in the deriving classes.

If more parameters are added as input in the derived classes, _fit_iterator() should be redefined accordingly, to yield the necessary arguments to this function (See _fit_iterator() for further details)

Parameters

(TimeSeries) (series) – TimeSeries against which the scaler will be fit.

Notes

This method is designed to be a static method instead of instance methods to allow an efficient parallelisation also when the scaler instance is storing a non-negligible amount of data. Using instance methods would imply copying the instance’s data through multiple processes, which can easily introduce a bottleneck and nullify parallelisation benefits.

Return type

Any

static ts_inverse_transform(series, transformer, *args, **kwargs)[source]

The function that will be applied to each series when inverse_transform() is called.

The function must take as first argument a TimeSeries object, and return the transformed TimeSeries object. Additional parameters can be added if necessary, but in this case, _inverse_transform_iterator() should be redefined accordingly, to yield the necessary arguments to this function (See _inverse_transform_iterator() for further details)

This method is not implemented in the base class and must be implemented in the deriving classes.

Parameters

(TimeSeries) (series) – TimeSeries which will be transformed.

Notes

This method is designed to be a static method instead of instance methods to allow an efficient parallelisation also when the scaler instance is storing a non-negligible amount of data. Using instance methods would imply copying the instance’s data through multiple processes, which can easily introduce a bottleneck and nullify parallelisation benefits.

Return type

TimeSeries

static ts_transform(series, transformer, **kwargs)[source]

The function that will be applied to each series when transform() is called.

The function must take as first argument a TimeSeries object, and return the transformed TimeSeries object. If more parameters are added as input in the derived classes, the _transform_iterator() should be redefined accordingly, to yield the necessary arguments to this function (See _transform_iterator() for further details).

This method is not implemented in the base class and must be implemented in the deriving classes.

Parameters

series (TimeSeries) – series to be transformed.

Notes

This method is designed to be a static method instead of instance method to allow an efficient parallelisation also when the scaler instance is storing a non-negligible amount of data. Using instance methods would imply copying the instance’s data through multiple processes, which can easily introduce a bottleneck and nullify parallelisation benefits.

Return type

TimeSeries