Facundo Bromberg, Dimitris Margaritis
In this paper we introduce a novel algorithm for the induction of the Markov network structure of a domain from the outcome of conditional independence tests on data. Such algorithms work by successively restricting the set of possible structures until there is only a single structure consistent with the conditional independence tests executed. Existing independence-based algorithms have well-known shortcomings, such as rigidly ordering the sequence of tests they perform, resulting in potential inefficiencies in the number of tests required, and committing fully to the test outcomes, resulting in lack of robustness in case of unreliable tests. We address both problems through a Bayesian particle filtering approach, which uses a population of Markov network structures to maintain the posterior probability distribution over them, given the outcomes of the tests performed. Instead of a fixed ordering, our approach greedily selects, at each step, the optimally informative from a pool of candidate tests according to information gain. In addition, it maintains multiple candidate structures weighed by posterior probability, which makes it more robust to errors in the test outcomes. The result is an approximate algorithm (due to the use of particle filtering) that is useful in domains where independence tests are uncertain (such as applications where little data is available) or expensive (such as cases of very large data sets and/or distributed data).
Subjects: 12. Machine Learning and Discoveryn
Submitted: Oct 12, 2006