- Time Series Data Analysis
- Machine Data
Anomaly DetectionGithub Link
This project is a anomaly detection pipeline for time series data of refrigerator temperatures using machine learning methods in the Scikit-Learn library. It firstdefines four classes `Wavelet`, `CompressionDistortion`, `Passthrough`, and `MSE` that are transformers to preprocess the data.
- 'Wavelet' class computes wavelet coefficients of the input time series data.
- `CompressionDistortion` applies Principal Component Analysis (PCA) to the wavelet coefficients, compresses them into fewer dimensions and then reconstructs the wavelet coefficients from the compressed data.
- `Passthrough` is a simple transformer that returns the input as it is.
- `MSE computes` the mean squared error between the original and the reconstructed wavelet coefficients, giving a measure of the reconstruction error.
The script then builds an anomaly detection pipeline that applies these transformations sequentially to the input data. The Pipeline applies these transformations and finally, it uses a SGDClassifier to classify data points as normal or anomalous based on the mean squared error exceeding a certain threshold.
The utility marketing team is contemplating a rebate for customers willing to upgrade their gas furnaces to more fuel-efficient models. To do so, they need to identifying which customers possess gas furnaces is a prerequisite. Gas furnaces, typically the most significant gas consumers in Pacific Northwest homes, operate primarily during cold weather. Hence, an analysis of gas consumption patterns relative to temperature can be instrumental in distinguishing customers with and without gas furnaces.