论文网址:[2106.06947] Graph Neural Network-Based Anomaly Detection in Multivariate Time Series (arxiv.org)
论文代码:https://github.com/d-ailin/GDN
英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
1. 省流版
1.1. 心得
2. 论文逐段精读
2.1. Abstract
2.2. Introduction
2.3. Related Work
2.4. Proposed Framework
2.4.1. Problem Statement
2.4.2. Overview
2.4.3. Sensor Embedding
2.4.4. Graph Structure Learning
2.4.5. Graph Attention-Based Forecasting
2.4.6. Graph Deviation Scoring
2.5. Experiments
2.5.1. Datasets
2.5.2. Baselines
2.5.3. Evaluation Metrics
2.5.4. Experimental Setup
2.5.5. RQ1. Accuracy
2.5.6. RQ2. Ablation
2.5.7. RQ3. Interpretability of Model
2.5.8. RQ4. Localizing Anomalies
2.6. Conclusion
3. Reference
1. 省流版
1.1. 心得
(1)还行还行,其实是比较简单的模型,而且介绍的非常清晰,而且有代码,看一下也不算亏
2. 论文逐段精读
2.1. Abstract
①Existing work: high-dimensional data solving improved by deep learning
②Challenge: existing methods do not explicitly learn the structure of existing relationships between variables
2.2. Introduction
①Mostly plenty of sensors help to anomaly detect
②Anomaly detection usually be classified as un-supervised problem due to the anomalies are unlabeled and variable
③They proposed Graph Deviation Network (GDN)
2.3. Related Work
(1)Anomaly Detection
①Methods are autoencoders (AE) and VAE as so on
(2)Multivariate Time Series Modelling
①This method models behavior of a multivariate time series based on its past behavior, including auto-regressive models, auto-regressive integrated moving average (ARIMA) models, CNN, LSTM, and GAN. However, it is difficult for them to handle complex and highly non-stationary time series.
(3)Graph Neural Networks
①Limited in multi sensor representation and unknown original graph structure
2.4. Proposed Framework
2.4.1. Problem Statement
①Number of sensors:
②Pattern of data: time series , it means each data comes from a sensor
③Time passing:
④Training data: only normal data
⑤Testing method: got data from sensors but only separate set of time ticks
⑥Output: list/array in length with binary result, where 1 indicate anomalous
2.4.2. Overview
①List the following four parts
②Overall framework:
2.4.3. Sensor Embedding
①Representation of each sensor:
2.4.4. Graph Structure Learning
①Graph construction: directed graph applied
②Nodes: sensors
③Edges: dependency relationships (An edge from one sensor to another indicates that the first sensor is used for modelling the behavior of the second sensor)
④Adjacency matrix:
⑤Prior information represented by candidate relations:
if there is no prior information, the candidate relations of sensor is all the sensor except for itself
⑥Sensor similarity:
the edge is calculated by the cosine similarity fomular and they select the top similar one
2.4.5. Graph Attention-Based Forecasting
①Predicted/expected action at time based on the input:
where denotes the size of sliding window, they need to predict the information
(1)Feature Extractor
①Aggregation methods:
where are attention coefficients and they are calculated by:
where denotes concatenation, denotes the vector of learned coefficients for
the attention mechanism
(2)Output Layer
①New features obtained of each nodes:
②Predicted value at time :
where denotes element-wise multiply and is the fully connected layer
③Loss function (MSE):
2.4.6. Graph Deviation Scoring
①The error valure of sensor at time :
②To attenuate the influence of single sensor, they normalize the error:
where is the median and denotes the inter-quartile range (IQR2) at time . ⭐作者认为中位数和IQR比平均值和标准差更robust
③Overall anomalousness at time :
作者解释说使用max是因为异常可能只影响一小部分传感器,甚至单个传感器...不一定吧???感觉得分情况
④Generating smoothed score by simple moving average(SMA)
⑤Threshold of anomaly: the maximum value over the validation data.
2.5. Experiments
2.5.1. Datasets
①Datasets: The Secure Water Treatment (SWaT) and Water Distribution (WADI)
②Statistics of 2 datasets:
③Sampling interval: 10 s
④Filtering: eliminate the first 2106 samples form the two dataset cuz the systems took 5-6 hours to reach stabilization when first turned on
2.5.2. Baselines
①Briefly introduce PCA, KNN, FB, AE, DAGMM, LSTM-VAE, MAD-GAN
2.5.3. Evaluation Metrics
①Metrics: precision (Prec), recall (Rec) and F1-Score (F1)
2.5.4. Experimental Setup
①Optimizer: Adam
②Learning rate: 1e-3
③Hyper-parameters:
④Epoch: 50 with early stop in 10
⑤ in WADI and in SWaT
⑥Sliding window size
2.5.5. RQ1. Accuracy
①Comparison table:
2.5.6. RQ2. Ablation
①Ablation studies with static complete graph applied, w/o sensor embedding and w/o attention:
2.5.7. RQ3. Interpretability of Model
(1)Interpretability via Sensor Embeddings
①Embedding similarity between sensors via t-SNE:
where the 7 colors are 7 classes
(2)Interpretability via Graph Edges and Attention Weights
①Correlated sensors in WADI:
drawn by force-directed layout. where the red lines are the largest deviation
2.5.8. RQ4. Localizing Anomalies
①Visualization of prediction and reality:
②Their contributions: localize anomalies, find most related sensor, and visualize how the deviate from expectations
2.6. Conclusion
~
3. Reference
Deng, A. & Hooi, B. (2021) 'Graph Neural Network-Based Anomaly Detection in Multivariate Time Series', AAAI.