Wasserstein Scorer¶

Wasserstein Scorer (distance function defined between probability distributions) [1]. The implementations is wrapped around scipy.stats.

References

1: https://en.wikipedia.org/wiki/Wasserstein_metric

class darts.ad.scorers.wasserstein_scorer.WassersteinScorer(window=10, component_wise=False, window_agg=True, diff_fn=<function ae>)[source]¶

Bases: WindowedAnomalyScorer

Wasserstein Scorer

When calling fit(series), a moving window is applied, which results in a set of vectors of size W, where W is the window size. These vectors are kept in memory, representing the training distribution. The score(series) function will apply the same moving window. The Wasserstein distance is computed between the training distribution and each vector, resulting in an anomaly score.

Alternatively, the scorer has the functions fit_from_prediction() and score_from_prediction(). Both require two series (actual and prediction), and compute a “difference” series by applying the function diff_fn (default: absolute difference). The resulting series is then passed to the functions fit() and score(), respectively.

component_wise is a boolean parameter indicating how the model should behave with multivariate inputs series. If set to True, the model will treat each series dimension independently. If set to False, the model concatenates the dimensions in each windows of length W and computes a single score for all dimensions.

Training with fit():

The input can be a series (univariate or multivariate) or multiple series. The series will be partitioned into equal size subsequences. Each subsequence has size W * D (features), where:

W is the size of the window given as a parameter window
D is the dimension of the series (D = 1 if univariate or if component_wise is set to True)

For a series of length N, (N - W + 1) subsequences will be generated. The final array X passed to the underlying scorer has shape (N - W + 1, W * D); or in other terms (number of samples, number of features). If a list of series is given of length L, each series i is partitioned, and all X_i are concatenated along the sample axis.

The arrays will be kept in memory, representing the training data distribution. In practice, the series or list of series can for instance represent residuals than can be considered independent and identically distributed (iid).

If component_wise is set to True, the algorithm will be applied to each dimension independently. For each dimension, a PyOD model will be trained.

Computing score with score():

The input can be a series (univariate or multivariate) or a sequence of series. The given series must have the same dimension D as the data used to train the PyOD model.

For each series, if the series is multivariate of dimension D:

if component_wise is set to False: it returns a univariate series (dimension=1). It represents the anomaly score of the entire series in the considered window at each timestamp.
if component_wise is set to True: it returns a multivariate series of dimension D. Each dimension represents the anomaly score of the corresponding component of the input.

If the series is univariate, it returns a univariate series regardless of the parameter component_wise.

A window of size W is rolled on the series with a stride equal to 1. It is the same size window W used during the training phase. Each value in the score series thus represents how anomalous the sample of the W previous values is.

Parameters

window (int) – Size of the sliding window that represents the number of samples in the testing distribution to compare with the training distribution in the Wasserstein function
component_wise (bool) – Boolean value indicating if the score needs to be computed for each component independently (True) or by concatenating the component in the considered window to compute one score (False). Default: False.
window_agg (bool) – Boolean indicating whether the anomaly score for each time step is computed by averaging the anomaly scores for all windows this point is included in. If False, the anomaly score for each point is the anomaly score of its trailing window. Default: True.
diff_fn (Callable[…, Union[float, list[float], ndarray, list[ndarray]]]) – The differencing function to use to transform the predicted and actual series into one series. The scorer is then applied to this series. Must be one of Darts per-time-step metrics (e.g., ae() for the absolute difference, err() for the difference, se() for the squared difference, …). By default, uses the absolute difference (ae()).

Attributes

`is_probabilistic`	Whether the scorer expects a probabilistic prediction as the first input.
`is_trainable`	Whether the Scorer is trainable.
`is_univariate`	Whether the Scorer is a univariate scorer.

Methods

`eval_metric`(anomalies, series[, metric])	Computes the anomaly score of the given time series, and returns the score of an agnostic threshold metric.
`eval_metric_from_prediction`(anomalies, ...)	Computes the anomaly score between series and pred_series, and returns the score of an agnostic threshold metric.
`fit`(series)	Fits the scorer on the given time series.
`fit_from_prediction`(series, pred_series)	Fits the scorer on the two (sequences of) series.
`score`(series)	Computes the anomaly score on the given series.
`score_from_prediction`(series, pred_series)	Computes the anomaly score on the two (sequence of) series.
`show_anomalies`(series[, anomalies, ...])	Plot the results of the scorer.
`show_anomalies_from_prediction`(series, ...)	Plot the results of the scorer.

eval_metric(anomalies, series, metric='AUC_ROC')¶

Computes the anomaly score of the given time series, and returns the score of an agnostic threshold metric.

Parameters

anomalies (Union[TimeSeries, Sequence[TimeSeries]]) – The (sequence of) ground truth binary anomaly series (1 if it is an anomaly and 0 if not).
series (Union[TimeSeries, Sequence[TimeSeries]]) – The (sequence of) series to detect anomalies from.
metric (Literal[‘AUC_ROC’, ‘AUC_PR’]) – The name of the metric function to use. Must be one of “AUC_ROC” (Area Under the Receiver Operating Characteristic Curve) and “AUC_PR” (Average Precision from scores). Default: “AUC_ROC”.

Return type

Union[float, Sequence[float], Sequence[Sequence[float]]]

Returns

float – A single score/metric for univariate series series (with only one component/column).
Sequence[float] – A sequence (list) of scores for:
- multivariate series series (multiple components). Gives a score for each component.
- a sequence (list) of univariate series series. Gives a score for each series.
Sequence[Sequence[float]] – A sequence of sequences of scores for a sequence of multivariate series series. Gives a score for each series (outer sequence) and component (inner sequence).

eval_metric_from_prediction(anomalies, series, pred_series, metric='AUC_ROC')¶

Computes the anomaly score between series and pred_series, and returns the score of an agnostic threshold metric.

Parameters

anomalies (Union[TimeSeries, Sequence[TimeSeries]]) – The (sequence of) ground truth binary anomaly series (1 if it is an anomaly and 0 if not).
series (Union[TimeSeries, Sequence[TimeSeries]]) – The (sequence of) actual series.
pred_series (Union[TimeSeries, Sequence[TimeSeries]]) – The (sequence of) predicted series.
metric (Literal[‘AUC_ROC’, ‘AUC_PR’]) – The name of the metric function to use. Must be one of “AUC_ROC” (Area Under the Receiver Operating Characteristic Curve) and “AUC_PR” (Average Precision from scores). Default: “AUC_ROC”.

Return type

Union[float, Sequence[float], Sequence[Sequence[float]]]

Returns

float – A single metric value for a single univariate series.
Sequence[float] – A sequence of metric values for:
- a single multivariate series.
- a sequence of univariate series.
Sequence[Sequence[float]] – A sequence of sequences of metric values for a sequence of multivariate series. The outer sequence is over the series, and inner sequence is over the series’ components/columns.

fit(series)¶

Fits the scorer on the given time series.

If a sequence of series, the scorer is fitted on the concatenation of the sequence.

The assumption is that series is generally anomaly-free.

Parameters: series (Union[TimeSeries, Sequence[TimeSeries]]) – The (sequence of) series with no anomalies.
Returns: Fitted Scorer.
Return type: self

fit_from_prediction(series, pred_series)¶

Fits the scorer on the two (sequences of) series.

The function diff_fn passed as a parameter to the scorer, will transform pred_series and series into one series. By default, diff_fn will compute the absolute difference (Default: ae()). If pred_series and series are sequences, diff_fn will be applied to all pairwise elements of the sequences.

The scorer will then be fitted on this (sequence of) series. If a sequence of series is given, the scorer will be fitted on the concatenation of the sequence.

The scorer assumes that the (sequence of) series is anomaly-free.

If any of the series is stochastic (with n_samples>1), diff_fn is computed on quantile 0.5.

Parameters

series (Union[TimeSeries, Sequence[TimeSeries]]) – The (sequence of) actual series.
pred_series (Union[TimeSeries, Sequence[TimeSeries]]) – The (sequence of) predicted series.

Returns

Fitted Scorer.

Return type

self

property is_probabilistic: bool¶

Whether the scorer expects a probabilistic prediction as the first input.

Return type: bool

property is_trainable: bool¶

Whether the Scorer is trainable.

Return type: bool

property is_univariate: bool¶

Whether the Scorer is a univariate scorer.

Return type: bool

score(series)¶

Computes the anomaly score on the given series.

If a sequence of series is given, the scorer will score each series independently and return an anomaly score for each series in the sequence.

Parameters: series (Union[TimeSeries, Sequence[TimeSeries]]) – The (sequence of) series to detect anomalies from.
Returns: (Sequence of) anomaly score time series
Return type: Union[TimeSeries, Sequence[TimeSeries]]

score_from_prediction(series, pred_series)¶

Computes the anomaly score on the two (sequence of) series.

The function diff_fn passed as a parameter to the scorer, will transform pred_series and series into one “difference” series. By default, diff_fn will compute the absolute difference (Default: ae()). If series and pred_series are sequences, diff_fn will be applied to all pairwise elements of the sequences.

The scorer will then transform this series into an anomaly score. If a sequence of series is given, the scorer will score each series independently and return an anomaly score for each series in the sequence.

Parameters

series (Union[TimeSeries, Sequence[TimeSeries]]) – The (sequence of) actual series.
pred_series (Union[TimeSeries, Sequence[TimeSeries]]) – The (sequence of) predicted series.

Returns

(Sequence of) anomaly score time series

Return type

Union[TimeSeries, Sequence[TimeSeries]]

show_anomalies(series, anomalies=None, scorer_name=None, title=None, metric=None, component_wise=False)¶

Plot the results of the scorer.

Computes the score on the given series input. And plots the results.

The plot will be composed of the following:

the series itself.
the anomaly score of the score.
the actual anomalies, if given.

It is possible to:

add a title to the figure with the parameter title
give personalized name to the scorer with scorer_name
show the results of a metric for the anomaly score (AUC_ROC or AUC_PR),

if the actual anomalies is provided.

Parameters

series (TimeSeries) – The series to visualize anomalies from.
anomalies (Optional[TimeSeries, None]) – The (sequence of) ground truth binary anomaly series (1 if it is an anomaly and 0 if not).
scorer_name (Optional[str, None]) – Name of the scorer.
title (Optional[str, None]) – Title of the figure
metric (Optional[Literal[‘AUC_ROC’, ‘AUC_PR’], None]) – Optionally, the name of the metric function to use. Must be one of “AUC_ROC” (Area Under the Receiver Operating Characteristic Curve) and “AUC_PR” (Average Precision from scores). Default: “AUC_ROC”.
component_wise (bool) – If True, will separately plot each component in case of multivariate anomaly detection.

show_anomalies_from_prediction(series, pred_series, scorer_name=None, anomalies=None, title=None, metric=None, component_wise=False)¶

Plot the results of the scorer.

Computes the anomaly score on the two series. And plots the results.

The plot will be composed of the following:

the series and the pred_series.
the anomaly score of the scorer.
the actual anomalies, if given.

It is possible to:

add a title to the figure with the parameter title
give personalized name to the scorer with scorer_name
show the results of a metric for the anomaly score (AUC_ROC or AUC_PR), if the actual anomalies is provided.

Parameters

series (TimeSeries) – The actual series to visualize anomalies from.
pred_series (TimeSeries) – The predicted series of series.
anomalies (Optional[TimeSeries, None]) – The ground truth of the anomalies (1 if it is an anomaly and 0 if not)
scorer_name (Optional[str, None]) – Name of the scorer.
title (Optional[str, None]) – Title of the figure
metric (Optional[Literal[‘AUC_ROC’, ‘AUC_PR’], None]) – Optionally, the name of the metric function to use. Must be one of “AUC_ROC” (Area Under the Receiver Operating Characteristic Curve) and “AUC_PR” (Average Precision from scores). Default: “AUC_ROC”.
component_wise (bool) – If True, will separately plot each component in case of multivariate anomaly detection.

Utils for Anomaly Detection