DAI-Lab An open source project from Data to AI Lab at MIT.

“Orion”

Development Status PyPi Shield Run Tests Shield Downloads Binder

Orion

Date: Nov 13, 2024 Version: 0.6.2.dev0

Overview

Orion is a machine learning library built for unsupervised time series anomaly detection. Such signals are generated by a wide variety of systems, few examples include: telemetry data generated by satellites, signals from wind turbines, and even stock market price tickers. We built this library to:

  • provide one place where users can find the latest and greatest in machine learning and deep learning world including our own innovations.

  • abstract away from the users the nitty-gritty about preprocessing, finding the best pipeline, and postprocessing.

  • provide a systematic way to evaluate the latest and greatest machine learning methods via our benchmarking effort.

  • build time series anomaly detection platforms custom to their workflows through our backend database and rest api.

  • provide a way for machine learning researchers to contribute in a scaffolded way so their innovations are immediately available to the end users.

The library makes use of a number of automated machine learning tools developed under Data to AI Lab at MIT.

Unsupervised time series anomaly detection (UTSAD)

A time series anomaly is defined as a time point or period where a system behaves unusually. Broadly speaking, there are two types of anomalies:

  • point anomaly: a single data point that has reached an unusual value.

  • collective anomaly: a continuous sequence of data points that are considered anomalous as a whole, even if the individual data points may not be unusual.

Time series anomaly detection aims to isolate anomalous subsequences of varied lengths within time series. One of the simplest detection techniques is thresholding, which detects data points that exceed a normal range. However, many anomalies do not exceed any boundaries – for example, they may have values that are purportedly normal, but are unusual at the specific time that they occur (i.e., contextual anomalies). These anomalies are harder to identify because the context of a signal is often unclear.

Machine learning for UTSAD

The rich variety of anomaly types, data types and application scenarios has spurred a range of detection approaches over the past several years. The simplest of which are out-of-limit methods, which flag regions where values exceed a certain threshold. ‌ While these methods are intuitive, they are inflexible and incapable of detecting contextual anomalies. To overcome this more advanced, machine learning (ML) based techniques, namely: proximity-based, prediction-based,and reconstruction-based, have been proposed.

  • Proximity based: these methods first use a distance measure to quantify similarity between objects. Objects that are distant from others are considered as anomalies.

  • Prediction based: these methods learn a predictive model to fit the given time series data, and then use that model to predict future values. A data point is identified as an anomaly if the difference between its predicted input and the original input exceeds a certain threshold.

  • Reconstruction based: these methods learn a model to create a synthetic signal by mapping the original signal to a lower dimension then back to high dimension. This method assumes that anomalies lose information when they are mapped to a lower dimension space, thereby cannot be effectively reconstructed; thus, high reconstruction errors suggest high chances of being anomalies.

You can read more about the differences between these approaches in Time Series Anomaly Detection using GAN.

Explore Orion