AAAI Publications, Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence

Font Size: 
A Proposal for Statistical Outlier Detection in Relational Structures
Fatemeh Riahi, Oliver Schulte, Qing Li

Last modified: 2014-06-18


This paper extends unsupervised statistical outlier detection to the case of relational data. For nonrelational data, where each individual is characterized by a feature vector, a common approach starts with learning a generative statistical model for the population. The model assigns a likelihood measure for the feature vector that characterizes the individual; the lower the feature vector likelihood, the more anomalous the individual. A difference between relational and nonrelational data is that an individual is characterized not only by a list of attributes, but also by its links and by attributes of the individuals linked to it. We refer to a relational structure that specifies this information for a specific individual as the individual's database. Our proposal is to use the likelihood assigned by a generative model to the individual's database as the anomaly score for the individual; the lower the model likelihood, the more anomalous the individual. As a novel validation method, we compare the model likelihood with metrics of individual success. An empirical evaluation reveals a surprising finding in soccer and movie data: We observe in the data a strong correlation between the likelihood and success metrics.


Anomaly Detection; Bayesian Networks; relational databases

Full Text: PDF