论文网址:[2106.06947] Graph Neural Network-Based Anomaly Detection in Multivariate Time Series (arxiv.org)


英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用


1. 省流版

1.1. 心得


2. 论文逐段精读

2.1. Abstract

        ①Existing work: high-dimensional data solving improved by deep learning

        ②Challenge: existing methods do not explicitly learn the structure of existing relationships between variables

2.2. Introduction

        ①Mostly plenty of sensors help to anomaly detect

        ②Anomaly detection usually be classified as un-supervised problem due to the anomalies are unlabeled and variable

        ③They proposed Graph Deviation Network (GDN)

2.3. Related Work

(1)Anomaly Detection

        ①Methods are autoencoders (AE) and VAE as so on

(2)Multivariate Time Series Modelling

         ①This method models behavior of a multivariate time series based on its past behavior, including auto-regressive models, auto-regressive integrated moving average (ARIMA) models, CNN, LSTM, and GAN. However, it is difficult for them to handle complex and highly non-stationary time series.

(3)Graph Neural Networks

        ①Limited in multi sensor representation and unknown original graph structure

2.4. Proposed Framework

2.4.1. Problem Statement

        ①Number of sensors: N

        ②Pattern of data: time series \mathbf{s}_{\mathrm{train}}=\left[\mathbf{s}_{\mathrm{train}}^{(1)},\cdots,\mathbf{s}_{\mathrm{train}}^{(T_{\mathrm{train}})}\right],\mathbf{s}_{\mathrm{train}}^{(t)} \in \mathbb{R}^{N}, it means each data comes from a sensor

        ③Time passing: T_{train}

        ④Training data: only normal data

        ⑤Testing method: got data from N sensors but only separate set of T_{test} time ticks \mathbf{s}_{\mathrm{test}}=\begin{bmatrix}\mathbf{s}_{\mathrm{test}}^{(1)},\cdots,\mathbf{s}_{\mathrm{test}}^{(T_{\mathrm{test}})}\end{bmatrix}

        ⑥Output: list/array in T_{test} length with binary result, \mathsf{a}(t)\in\{0,1\} where 1 indicate anomalous

2.4.2. Overview

        ①List the following four parts

        ②Overall framework:

2.4.3. Sensor Embedding

        ①Representation of each sensor: \mathbf{v_{i}}\in\mathbb{R}^{d},\mathrm{~for~}i\in\{1,2,\cdots,N\}

2.4.4. Graph Structure Learning

        ①Graph construction: directed graph applied

        ②Nodes: sensors

        ③Edges: dependency relationships (An edge from one sensor to another indicates that the first sensor is used for modelling the behavior of the second sensor)

        ④Adjacency matrix: A

        ⑤Prior information represented by candidate relations:


if there is no prior information, the candidate relations of sensor i  is all the sensor except for i itself

        ⑥Sensor similarity:


the edge is calculated by the cosine similarity fomular and they select the top k similar one

2.4.5. Graph Attention-Based Forecasting

        ①Predicted/expected action at time t based on the input:

\mathbf{x^{(t)}}:=\begin{bmatrix}\mathbf{s^{(t-w)}},\mathbf{s^{(t-w+1)}},\cdots,\mathbf{s^{(t-1)}}\end{bmatrix}, \mathbf{x^{(t)}}\in\mathbb{R}^{N\times w}

where w denotes the size of sliding window, they need to predict the information \mathbf{s^{(t)}}

(1)Feature Extractor

        ①Aggregation methods:


where \alpha_{i,i} are attention coefficients and they are calculated by:

\begin{aligned} \mathbf{g}_{i}^{(t)}& =\mathbf{v}_{i}\oplus\mathbf{Wx}_{i}^{(t)} \\ \pi\left(i,j\right)& =\mathrm{LeakyReLU}\left(\mathbf{a}^{\top}\left(\mathbf{g}_{i}^{(t)}\oplus\mathbf{g}_{j}^{(t)}\right)\right) \\ \alpha_{i,j}& =\frac{\exp\left(\pi\left(i,j\right)\right)}{\sum_{k\in\mathcal{N}(i)\cup\{i\}}\exp\left(\pi\left(i,k\right)\right)}, \end{aligned}

where \bigoplus denotes concatenation, \mathbf{a} denotes the vector of learned coefficients for
the attention mechanism

(2)Output Layer

        ①New features obtained of each nodes: \{\mathbf{z}_{1}^{(t)},\cdots,\mathbf{z}_{N}^{(t)}\}

        ②Predicted value at time t:


where \circ denotes element-wise multiply and f is the fully connected layer

        ③Loss function (MSE):


2.4.6. Graph Deviation Scoring

        ①The error valure of sensor i at time t:


        ②To attenuate the influence of single sensor, they normalize the error:


where \widetilde{\mu}_i is the median and \widetilde{\sigma}_i denotes the inter-quartile range (IQR2) at time t. ⭐作者认为中位数和IQR比平均值和标准差更robust

        ③Overall anomalousness at time t:



        ④Generating smoothed score A_{s}\left(t\right) by simple moving average(SMA)

        ⑤Threshold of anomaly: the maximum value over the validation data.

2.5. Experiments

2.5.1. Datasets

        ①Datasets: The Secure Water Treatment (SWaT) and Water Distribution (WADI)

        ②Statistics of 2 datasets:

        ③Sampling interval: 10 s

        ④Filtering: eliminate the first 2106 samples form the two dataset cuz the systems took 5-6 hours to reach stabilization when first turned on

2.5.2. Baselines

        ①Briefly introduce PCA, KNN, FB, AE, DAGMM, LSTM-VAE, MAD-GAN

2.5.3. Evaluation Metrics

        ①Metrics: precision (Prec), recall (Rec) and F1-Score (F1)

2.5.4. Experimental Setup

        ①Optimizer: Adam

        ②Learning rate: 1e-3

        ③Hyper-parameters: \beta _1=0.9,\beta _2=0.99

        ④Epoch: 50 with early stop in 10

        ⑤k=30 in WADI and  k=15 in SWaT 

        ⑥Sliding window size w=5

2.5.5. RQ1. Accuracy

        ①Comparison table:

2.5.6. RQ2. Ablation

        ①Ablation studies with static complete graph applied, w/o sensor embedding and w/o attention:

2.5.7. RQ3. Interpretability of Model

(1)Interpretability via Sensor Embeddings

        ①Embedding similarity between sensors via t-SNE:

where the 7 colors are 7 classes

(2)Interpretability via Graph Edges and Attention Weights

        ①Correlated sensors in WADI:

drawn by force-directed layout. where the red lines are the largest deviation

2.5.8. RQ4. Localizing Anomalies

        ①Visualization of prediction and reality:

        ②Their contributions: localize anomalies, find most related sensor, and visualize how the deviate from expectations

2.6. Conclusion


3. Reference

Deng, A. & Hooi, B. (2021) 'Graph Neural Network-Based Anomaly Detection in Multivariate Time Series', AAAI.


