欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 教育 > 高考 > [论文精读]Graph Neural Network-Based Anomaly Detection in Multivariate Time Series

[论文精读]Graph Neural Network-Based Anomaly Detection in Multivariate Time Series

2024/10/24 22:20:35 来源:https://blog.csdn.net/Sherlily/article/details/142180374  浏览:    关键词:[论文精读]Graph Neural Network-Based Anomaly Detection in Multivariate Time Series

论文网址:[2106.06947] Graph Neural Network-Based Anomaly Detection in Multivariate Time Series (arxiv.org)

论文代码:https://github.com/d-ailin/GDN

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 省流版

1.1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.4. Proposed Framework

2.4.1. Problem Statement

2.4.2. Overview

2.4.3. Sensor Embedding

2.4.4. Graph Structure Learning

2.4.5. Graph Attention-Based Forecasting

2.4.6. Graph Deviation Scoring

2.5. Experiments

2.5.1. Datasets

2.5.2. Baselines

2.5.3. Evaluation Metrics

2.5.4. Experimental Setup

2.5.5. RQ1. Accuracy

2.5.6. RQ2. Ablation

2.5.7. RQ3. Interpretability of Model

2.5.8. RQ4. Localizing Anomalies

2.6. Conclusion

3. Reference


1. 省流版

1.1. 心得

(1)还行还行,其实是比较简单的模型,而且介绍的非常清晰,而且有代码,看一下也不算亏

2. 论文逐段精读

2.1. Abstract

        ①Existing work: high-dimensional data solving improved by deep learning

        ②Challenge: existing methods do not explicitly learn the structure of existing relationships between variables

2.2. Introduction

        ①Mostly plenty of sensors help to anomaly detect

        ②Anomaly detection usually be classified as un-supervised problem due to the anomalies are unlabeled and variable

        ③They proposed Graph Deviation Network (GDN)

2.3. Related Work

(1)Anomaly Detection

        ①Methods are autoencoders (AE) and VAE as so on

(2)Multivariate Time Series Modelling

         ①This method models behavior of a multivariate time series based on its past behavior, including auto-regressive models, auto-regressive integrated moving average (ARIMA) models, CNN, LSTM, and GAN. However, it is difficult for them to handle complex and highly non-stationary time series.

(3)Graph Neural Networks

        ①Limited in multi sensor representation and unknown original graph structure

2.4. Proposed Framework

2.4.1. Problem Statement

        ①Number of sensors: N

        ②Pattern of data: time series \mathbf{s}_{\mathrm{train}}=\left[\mathbf{s}_{\mathrm{train}}^{(1)},\cdots,\mathbf{s}_{\mathrm{train}}^{(T_{\mathrm{train}})}\right],\mathbf{s}_{\mathrm{train}}^{(t)} \in \mathbb{R}^{N}, it means each data comes from a sensor

        ③Time passing: T_{train}

        ④Training data: only normal data

        ⑤Testing method: got data from N sensors but only separate set of T_{test} time ticks \mathbf{s}_{\mathrm{test}}=\begin{bmatrix}\mathbf{s}_{\mathrm{test}}^{(1)},\cdots,\mathbf{s}_{\mathrm{test}}^{(T_{\mathrm{test}})}\end{bmatrix}

        ⑥Output: list/array in T_{test} length with binary result, \mathsf{a}(t)\in\{0,1\} where 1 indicate anomalous

2.4.2. Overview

        ①List the following four parts

        ②Overall framework:

2.4.3. Sensor Embedding

        ①Representation of each sensor: \mathbf{v_{i}}\in\mathbb{R}^{d},\mathrm{~for~}i\in\{1,2,\cdots,N\}

2.4.4. Graph Structure Learning

        ①Graph construction: directed graph applied

        ②Nodes: sensors

        ③Edges: dependency relationships (An edge from one sensor to another indicates that the first sensor is used for modelling the behavior of the second sensor)

        ④Adjacency matrix: A

        ⑤Prior information represented by candidate relations:

\mathcal{C}_i\subseteq\{1,2,\cdots,N\}\setminus\{i\}

if there is no prior information, the candidate relations of sensor i  is all the sensor except for i itself

        ⑥Sensor similarity:

\begin{aligned}&e_{ji}=\frac{\mathbf{v_{i}}^{\top}\mathbf{v_{j}}}{\|\mathbf{v_{i}}\|\cdot\|\mathbf{v_{j}}\|}\mathrm{~for~}j\in\mathcal{C}_{i}\\&A_{ji}=1\{j\in\mathsf{TopK}(\{e_{ki}:k\in\mathcal{C}_{i}\})\}\end{aligned}

the edge is calculated by the cosine similarity fomular and they select the top k similar one

2.4.5. Graph Attention-Based Forecasting

        ①Predicted/expected action at time t based on the input:

\mathbf{x^{(t)}}:=\begin{bmatrix}\mathbf{s^{(t-w)}},\mathbf{s^{(t-w+1)}},\cdots,\mathbf{s^{(t-1)}}\end{bmatrix}, \mathbf{x^{(t)}}\in\mathbb{R}^{N\times w}

where w denotes the size of sliding window, they need to predict the information \mathbf{s^{(t)}}

(1)Feature Extractor

        ①Aggregation methods:

\mathbf{z}_{i}^{(t)}=\mathrm{ReLU}\left(\alpha_{i,i}\mathbf{W}\mathbf{x}_{i}^{(t)}+\sum_{j\in\mathcal{N}(i)}\alpha_{i,j}\mathbf{W}\mathbf{x}_{j}^{(t)}\right)

where \alpha_{i,i} are attention coefficients and they are calculated by:

\begin{aligned} \mathbf{g}_{i}^{(t)}& =\mathbf{v}_{i}\oplus\mathbf{Wx}_{i}^{(t)} \\ \pi\left(i,j\right)& =\mathrm{LeakyReLU}\left(\mathbf{a}^{\top}\left(\mathbf{g}_{i}^{(t)}\oplus\mathbf{g}_{j}^{(t)}\right)\right) \\ \alpha_{i,j}& =\frac{\exp\left(\pi\left(i,j\right)\right)}{\sum_{k\in\mathcal{N}(i)\cup\{i\}}\exp\left(\pi\left(i,k\right)\right)}, \end{aligned}

where \bigoplus denotes concatenation, \mathbf{a} denotes the vector of learned coefficients for
the attention mechanism

(2)Output Layer

        ①New features obtained of each nodes: \{\mathbf{z}_{1}^{(t)},\cdots,\mathbf{z}_{N}^{(t)}\}

        ②Predicted value at time t:

\mathbf{\hat{s}}^{(\mathbf{t})}=f_\theta\left(\begin{bmatrix}\mathbf{v}_1\circ\mathbf{z}_1^{(t)},\cdots,\mathbf{v}_N\circ\mathbf{z}_N^{(t)}\end{bmatrix}\right)

where \circ denotes element-wise multiply and f is the fully connected layer

        ③Loss function (MSE):

L_{\mathrm{MSE}}=\frac{1}{T_{\mathrm{train}}-w}\sum_{t=w+1}^{T_{\mathrm{train}}}\left\|\mathbf{\hat{s}}^{(\mathbf{t})}-\mathbf{s}^{(\mathbf{t})}\right\|_{2}^{2}

2.4.6. Graph Deviation Scoring

        ①The error valure of sensor i at time t:

\mathsf{Err}_i(t)=|\mathbf{s_i^{(t)}}-\mathbf{\hat{s}_i^{(t)}}|

        ②To attenuate the influence of single sensor, they normalize the error:

a_i\left(t\right)=\frac{\mathsf{Err}_i\left(t\right)-\widetilde{\mu}_i}{\widetilde{\sigma}_i}

where \widetilde{\mu}_i is the median and \widetilde{\sigma}_i denotes the inter-quartile range (IQR2) at time t. ⭐作者认为中位数和IQR比平均值和标准差更robust

        ③Overall anomalousness at time t:

A\left(t\right)=\max_{i}a_{i}\left(t\right)

作者解释说使用max是因为异常可能只影响一小部分传感器,甚至单个传感器...不一定吧???感觉得分情况

        ④Generating smoothed score A_{s}\left(t\right) by simple moving average(SMA)

        ⑤Threshold of anomaly: the maximum value over the validation data.

2.5. Experiments

2.5.1. Datasets

        ①Datasets: The Secure Water Treatment (SWaT) and Water Distribution (WADI)

        ②Statistics of 2 datasets:

        ③Sampling interval: 10 s

        ④Filtering: eliminate the first 2106 samples form the two dataset cuz the systems took 5-6 hours to reach stabilization when first turned on

2.5.2. Baselines

        ①Briefly introduce PCA, KNN, FB, AE, DAGMM, LSTM-VAE, MAD-GAN

2.5.3. Evaluation Metrics

        ①Metrics: precision (Prec), recall (Rec) and F1-Score (F1)

2.5.4. Experimental Setup

        ①Optimizer: Adam

        ②Learning rate: 1e-3

        ③Hyper-parameters: \beta _1=0.9,\beta _2=0.99

        ④Epoch: 50 with early stop in 10

        ⑤k=30 in WADI and  k=15 in SWaT 

        ⑥Sliding window size w=5

2.5.5. RQ1. Accuracy

        ①Comparison table:

2.5.6. RQ2. Ablation

        ①Ablation studies with static complete graph applied, w/o sensor embedding and w/o attention:

2.5.7. RQ3. Interpretability of Model

(1)Interpretability via Sensor Embeddings

        ①Embedding similarity between sensors via t-SNE:

where the 7 colors are 7 classes

(2)Interpretability via Graph Edges and Attention Weights

        ①Correlated sensors in WADI:

drawn by force-directed layout. where the red lines are the largest deviation

2.5.8. RQ4. Localizing Anomalies

        ①Visualization of prediction and reality:

        ②Their contributions: localize anomalies, find most related sensor, and visualize how the deviate from expectations

2.6. Conclusion

        ~

3. Reference

Deng, A. & Hooi, B. (2021) 'Graph Neural Network-Based Anomaly Detection in Multivariate Time Series', AAAI.

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com