Irina Rish, Mark Brodie, and Sheng Ma, IBM T.J. Watson Research Center
This paper studies the accuracy/efficiency trade-off in probabilistic diagnosis formulated as finding the most-likely explanation (MPE) in a Bayesian network. Our work is motivated by a practical problem of efficient real-time fault diagnosis in computer networks using test transactions, or probes, sent through the network. The key efficiency issues include both the cost of probing (e.g., the number of probes), and the computational complexity of diagnosis, while the diagnostic accuracy is crucial for maintaining high levels of network performance. Herein, we derive a lower bound on the diagnostic accuracy that provides necessary conditions for the number of probes needed to achieve an asymptotically error-free diagnosis as the network size increases, given prior fault probabilities and a certain level of noise in probe outcomes. Since the exact MPE diagnosis is generally intractable in large networks, we investigate next the accuracy/efficiency trade-offs for very simple and efficient local approximation techniques, based on variable-elimination (the mini-bucket scheme). Our empirical studies show that these approximations "degrade gracefully" with noise and often yield an optimal solution when noise is low enough, and our initial theoretical analysis explains this behavior for the simplest (greedy) approximation. These encouraging results suggest the applicability of such approximations to certain almost-deterministic diagnostic problems that often arise in practical applications.