Prashant J. Doshi, Lloyd G. Greenwald, and John R. Clarke
The process of building a Bayesian network may occur in stages, in which intermediate Bayesian networks are built during preliminary processing and then used in the construction of further Bayesian networks. For example, in (Doshi, Greenwald, and Clarke 2001) we describe a way to use Bayesian networks to model and correct errors in noisy datasets. The corrected datasets are then used in (Doshi 2001) to build predictive Bayesian networks. Through this process we built networks that capture probabilistic relationships between 412 fields of data from 169,512 patients admitted to trauma centers in Pennsylvaniand registered in the Pennsylvania Trauma Systems Foundation Trauma Registry between 1986 and 1999. In the process mentioned above, intermediate Bayesian networks were used to find the most likely values for fields found to have errors. These most likely values were then used in the cleansed dataset. However, in the subsequent process of building Bayesian networks from this dataset, we questioned whether or not these intermediate networks used in error correction should have been retained. In other words, we wanted to understand the tradeoffs involved in retaining the distributional information summarized in each error-correction network rather than just retaining the most likely value for each corrected field. This question can be generalized to any process of building a Bayesian network in stages. This note describes preliminary work to understand these issues.