Benchmarking¶

We provide a benchmarking framework to enable users to compare multiple pipelines against each other. The evaluation metrics are documented within pipeline evaluation, please visit Evaluation to read more about it.

Releases¶

In every release, we run Orion benchmark. We maintain an up-to-date leaderboard with the current scoring of the verified pipelines according to the benchmarking procedure.

Results obtained during benchmarking as well as previous releases can be found within benchmark/results folder as CSV files. Summarized results can also be browsed in the following summary Google Sheets document as well as the details Google Sheets document.

Leaderboard¶

We run the benchmark on 12 datasets with their known grounth truth. We record the score of the pipelines on each datasets. To compute the leaderboard table, we showcase the number of wins each pipeline has over the ARIMA pipeline.

Pipeline	Outperforms ARIMA
AER	12
TadGAN	7
LSTM Dynamic Thresholding	9
LSTM Autoencoder	7
Dense Autoencoder	7
VAE	6
AnomalyTransformer	2
LNN	7
Matrix Profile	5
UniTS	6
TimesFM	7
GANF	5
Azure	0

To view a list of all available pipelines, visit Pipelines page.

Process¶

We evaluate the performance of pipelines by following a series of executions. From a high level, we can view the process as:

Use each pipeline to detect anomalies on all datasets and their signals.
Retrieve the list of known anomalies for each of these signals.
Compute the scores for each signal using multiple metrics (e.g. accuracy and f1).
Average the score obtained for each metric and pipeline across all the signals.
Finally, we rank our pipelines sorting them by one of the computed scores.

Benchmark function¶

The complete evaluation process is directly available using the orion.benchmark.benchmark function.

from orion.benchmark import benchmark

pipelines = [
    'arima',
    'lstm_dynamic_threshold'
]

metrics = ['f1', 'accuracy', 'recall', 'precision']

signals = ['S-1', 'P-1']

scores = benchmark(pipelines=pipelines, datasets=datasets, metrics=metrics, rank='f1')

For further details about all the arguments and possibilities that the benchmark function offers please refer to the Orion benchmark documentation

Evaluation Building a System