Jeremy Kubica, Andrew Moore, Jeff Schneider, and Yiming Yang, Carnegie Mellon University
Link detection and analysis has long been important in the social sciences and in the government intelligence community. A significant effort is focused on the structural and functional analysis of ``known'' networks. Similarly, the detection of individual links is important but is usually done with techniques that result in ``known'' links. More recently the internet and other sources have led to a flood of circumstantial data that provide probabilistic evidence of links. Co-occurence in news articles and simultaneous travel to the same location are two examples. We propose a probabilistic model of link generation based on membership in groups. The model considers both observed link evidence and demographic information about the entities. The parameters of the model are learned via a maximum likelihood search. In this paper we describe the model and then show several heuristics that make the search tractable. We test our model and optimization methods on synthetic data sets with a known ground truth and a database of news articles.