k-means Scorer¶
k-means Scorer implementing k-means clustering [1].
References
- class darts.ad.scorers.kmeans_scorer.KMeansScorer(window=1, k=8, component_wise=False, diff_fn='abs_diff', **kwargs)[source]¶
Bases:
darts.ad.scorers.scorers.FittableAnomalyScorer
When calling
fit(series)
, a moving window is applied, which results in a set of vectors of size W, where W is the window size. The k-means model is trained on these vectors. Thescore(series)
function applies the same moving window and returns the distance to the closest of the k centroids for each vector of size W.Alternatively, the scorer has the functions
fit_from_prediction()
andscore_from_prediction()
. Both require two series (actual and prediction), and compute a “difference” series by applying the functiondiff_fn
(default: absolute difference). The resulting series is then passed to the functionsfit()
andscore()
, respectively.component_wise is a boolean parameter indicating how the model should behave with multivariate inputs series. If set to True, the model will treat each component independently by fitting a different k-means model for each dimension. If set to False, the model concatenates the dimensions in each windows of length W and computes the score using only one underlying k-means model.
Training with
fit()
:The input can be a series (univariate or multivariate) or multiple series. The series will be sliced into equal size subsequences. The subsequence will be of size W * D, with:
W being the size of the window given as a parameter window
D being the dimension of the series (D = 1 if univariate or if component_wise is set to True)
For a series of length N, (N - W + 1)/W subsequences will be generated. If a list of series is given of length L, each series will be partitioned into subsequences, and the results will be concatenated into an array of length L * number of subsequences of each series.
The k-means model will be fitted on the generated subsequences. The model will find k clusters in the vector space of dimension equal to the length of the subsequences (D * W).
If component_wise is set to True, the algorithm will be applied to each dimension independently. For each dimension, a k-means model will be trained.
Computing score with
score()
:The input can be a series (univariate or multivariate) or a sequence of series. The given series must have the same dimension D as the data used to train the k-means model.
For each series, if the series is multivariate of dimension D:
if component_wise is set to False: it returns a univariate series (dimension=1). It represents the anomaly score of the entire series in the considered window at each timestamp.
if component_wise is set to True: it returns a multivariate series of dimension D. Each dimension represents the anomaly score of the corresponding component of the input.
If the series is univariate, it returns a univariate series regardless of the parameter component_wise.
A window of size W is rolled on the series with a stride equal to 1. It is the same size window W used during the training phase. Each value in the score series thus represents how anomalous the sample of the W previous values is.
- Parameters
window (
int
) – Size of the window used to create the subsequences of the series.k (
int
) – The number of clusters to form as well as the number of centroids to generate by the KMeans model.diff_fn – Optionally, reduction function to use if two series are given. It will transform the two series into one. This allows the KMeansScorer to apply KMeans on the original series or on its residuals (difference between the prediction and the original series). Must be one of “abs_diff” and “diff” (defined in
_diff_series()
). Default: “abs_diff”component_wise (
bool
) – Boolean value indicating if the score needs to be computed for each component independently (True) or by concatenating the component in the considered window to compute one score (False). Default: Falsekwargs – Additional keyword arguments passed to the internal scikit-learn KMeans model(s).
Attributes
Whether the scorer expects a probabilistic prediction for its first input.
Methods
Checks if the scorer has been fitted before calling its score() function.
eval_accuracy
(actual_anomalies, series[, metric])Computes the anomaly score of the given time series, and returns the score of an agnostic threshold metric.
eval_accuracy_from_prediction
(...[, metric])Computes the anomaly score between actual_series and pred_series, and returns the score of an agnostic threshold metric.
fit
(series)Fits the scorer on the given time series input.
fit_from_prediction
(actual_series, pred_series)Fits the scorer on the two (sequence of) series.
score
(series)Computes the anomaly score on the given series.
score_from_prediction
(actual_series, pred_series)Computes the anomaly score on the two (sequence of) series.
show_anomalies
(series[, actual_anomalies, ...])Plot the results of the scorer.
show_anomalies_from_prediction
(...[, ...])Plot the results of the scorer.
- check_if_fit_called()¶
Checks if the scorer has been fitted before calling its score() function.
- eval_accuracy(actual_anomalies, series, metric='AUC_ROC')¶
Computes the anomaly score of the given time series, and returns the score of an agnostic threshold metric.
- Parameters
actual_anomalies (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – The ground truth of the anomalies (1 if it is an anomaly and 0 if not)series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – The (sequence of) series to detect anomalies from.metric (
str
) – Optionally, metric function to use. Must be one of “AUC_ROC” and “AUC_PR”. Default: “AUC_ROC”
- Returns
- Score of an agnostic threshold metric for the computed anomaly score
float
if series is a univariate series (dimension=1).Sequence[float]
if series is a multivariate series (dimension>1), returns one
value per dimension, or * if series is a sequence of univariate series, returns one value per series
Sequence[Sequence[float]]]
if series is a sequence of multivariate
series. Outer Sequence is over the sequence input and the inner Sequence is over the dimensions of each element in the sequence input.
- Return type
Union[float, Sequence[float], Sequence[Sequence[float]]]
- eval_accuracy_from_prediction(actual_anomalies, actual_series, pred_series, metric='AUC_ROC')¶
Computes the anomaly score between actual_series and pred_series, and returns the score of an agnostic threshold metric.
- Parameters
actual_anomalies (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – The (sequence of) ground truth of the anomalies (1 if it is an anomaly and 0 if not)actual_series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – The (sequence of) actual series.pred_series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – The (sequence of) predicted series.metric (
str
) – Optionally, metric function to use. Must be one of “AUC_ROC” and “AUC_PR”. Default: “AUC_ROC”
- Returns
- Score of an agnostic threshold metric for the computed anomaly score
float
if actual_series and actual_series are univariate series (dimension=1).Sequence[float]
if actual_series and actual_series are multivariate series (dimension>1),
returns one value per dimension, or * if actual_series and actual_series are sequences of univariate series, returns one value per series
Sequence[Sequence[float]]]
if actual_series and actual_series are sequences
of multivariate series. Outer Sequence is over the sequence input and the inner Sequence is over the dimensions of each element in the sequence input.
- Return type
Union[float, Sequence[float], Sequence[Sequence[float]]]
- fit(series)¶
Fits the scorer on the given time series input.
If sequence of series is given, the scorer will be fitted on the concatenation of the sequence.
The assumption is that the series series used for training are generally anomaly-free.
- Parameters
series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – The (sequence of) series with no anomalies.- Returns
Fitted Scorer.
- Return type
self
- fit_from_prediction(actual_series, pred_series)¶
Fits the scorer on the two (sequence of) series.
The function
diff_fn
passed as a parameter to the scorer, will transform pred_series and actual_series into one series. By default,diff_fn
will compute the absolute difference (Default: “abs_diff”). If pred_series and actual_series are sequences,diff_fn
will be applied to all pairwise elements of the sequences.The scorer will then be fitted on this (sequence of) series. If a sequence of series is given, the scorer will be fitted on the concatenation of the sequence.
The scorer assumes that the (sequence of) actual_series is anomaly-free.
- Parameters
actual_series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – The (sequence of) actual series.pred_series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – The (sequence of) predicted series.
- Returns
Fitted Scorer.
- Return type
self
- property is_probabilistic: bool¶
Whether the scorer expects a probabilistic prediction for its first input.
- Return type
bool
- score(series)¶
Computes the anomaly score on the given series.
If a sequence of series is given, the scorer will score each series independently and return an anomaly score for each series in the sequence.
- Parameters
series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – The (sequence of) series to detect anomalies from.- Returns
(Sequence of) anomaly score time series
- Return type
Union[TimeSeries, Sequence[TimeSeries]]
- score_from_prediction(actual_series, pred_series)¶
Computes the anomaly score on the two (sequence of) series.
The function
diff_fn
passed as a parameter to the scorer, will transform pred_series and actual_series into one “difference” series. By default,diff_fn
will compute the absolute difference (Default: “abs_diff”). If actual_series and pred_series are sequences,diff_fn
will be applied to all pairwise elements of the sequences.The scorer will then transform this series into an anomaly score. If a sequence of series is given, the scorer will score each series independently and return an anomaly score for each series in the sequence.
- Parameters
actual_series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – The (sequence of) actual series.pred_series (
Union
[TimeSeries
,Sequence
[TimeSeries
]]) – The (sequence of) predicted series.
- Returns
(Sequence of) anomaly score time series
- Return type
Union[TimeSeries, Sequence[TimeSeries]]
- show_anomalies(series, actual_anomalies=None, scorer_name=None, title=None, metric=None)¶
Plot the results of the scorer.
Computes the score on the given series input. And plots the results.
- The plot will be composed of the following:
the series itself.
the anomaly score of the score.
the actual anomalies, if given.
- It is possible to:
add a title to the figure with the parameter title
give personalized name to the scorer with scorer_name
show the results of a metric for the anomaly score (AUC_ROC or AUC_PR),
if the actual anomalies is provided.
- Parameters
series (
TimeSeries
) – The series to visualize anomalies from.actual_anomalies (
Optional
[TimeSeries
]) – The ground truth of the anomalies (1 if it is an anomaly and 0 if not)scorer_name (
Optional
[str
]) – Name of the scorer.title (
Optional
[str
]) – Title of the figuremetric (
Optional
[str
]) – Optionally, Scoring function to use. Must be one of “AUC_ROC” and “AUC_PR”. Default: “AUC_ROC”
- show_anomalies_from_prediction(actual_series, pred_series, scorer_name=None, actual_anomalies=None, title=None, metric=None)¶
Plot the results of the scorer.
Computes the anomaly score on the two series. And plots the results.
- The plot will be composed of the following:
the actual_series and the pred_series.
the anomaly score of the scorer.
the actual anomalies, if given.
- It is possible to:
add a title to the figure with the parameter title
give personalized name to the scorer with scorer_name
show the results of a metric for the anomaly score (AUC_ROC or AUC_PR), if the actual anomalies is provided.
- Parameters
actual_series (
TimeSeries
) – The actual series to visualize anomalies from.pred_series (
TimeSeries
) – The predicted series of actual_series.actual_anomalies (
Optional
[TimeSeries
]) – The ground truth of the anomalies (1 if it is an anomaly and 0 if not)scorer_name (
Optional
[str
]) – Name of the scorer.title (
Optional
[str
]) – Title of the figuremetric (
Optional
[str
]) – Optionally, Scoring function to use. Must be one of “AUC_ROC” and “AUC_PR”. Default: “AUC_ROC”