Alias detection is a challenging task critical in several areas such as the intelligence community, social network analysis, databases, biology, and marketing. Problem domains can be as simple as datasets containing accidentally replicated data, or as complex as populations containing criminals or terrorists wielding multiple identities. Teasing out aliases or near aliases in the later case is a serious and challenging problem. We propose an unsupervised information theoretic approach for automatically detecting aliases in malicious environments by observing the behaviors of the entities. Our model discovers the most informative observations (e.g. emails, phone calls, relational data) between entities and then compares them to identify entities exhibiting similar behaviors. We test our model by applying it to the task of discovering aliases in a standard synthetic world of interrelated individuals. Given our system’s top-20 guesses, we extracted with 80% accuracy the true aliases of a given entity.
Subjects: 10. Knowledge Acquisition