Temporal Anomaly Detection: Calibrating the Surprise
We propose a hybrid approach to temporal anomaly detection in access data of users to databases — or more generally, any kind of subject-object co-occurrence data. We consider a high-dimensional setting that also requires fast computation at test time. Our methodology identifies anomalies based on a single stationary model, instead of requiring a full temporal one, which would be prohibitive in this setting. We learn a low-rank stationary model from the training data, and then fit a regression model for predicting the expected likelihood score of normal access patterns in the future. The disparity between the predicted likelihood score and the observed one is used to assess the “surprise” at test time. This approach enables calibration of the anomaly score, so that time-varying normal behavior patterns are not considered anomalous. We provide a detailed description of the algorithm, including a convergence analysis, and report encouraging empirical results. One of the data sets that we tested is new for the public domain. It consists of two months’ worth of database access records from a live system. This data set and our code are publicly available at https://github.com/eyalgut/TLR anomaly detection.git.