Guoliang Li and Tze-Yun Leong
Structure learning in Bayesian network is a big issue. Many efforts have tried to solve this problem and quite a few algorithms have been proposed. However, when we attempt to apply the existing methods to microarray data, there are three main challenges: 1) there are many variables in the data set, 2) the sample size is small, and 3) microarray data are changing from experiment to experiment and new data are available quickly. To address these three problems, we assume that the major functions of a kind of cells do not change too much in different experiments, and propose a framework to learn Bayesian network from data with variable grouping. This framework has several advantages: 1) it reduces the number of variables and narrows down the search space when learning Bayesian network structure; 2) it relieves the requirement for the number of samples; and 3) the learned group Bayesian network is a higher-level abstraction of biological functions in a cell, which is comparable from one experiment to another, and does not need to change much at the level when the learned group Bayesian network is applied to changing experiments - only the relationship between a group variable and an original variable should be adjusted. We have done experiments on synthetic examples and real data to test the proposed framework. The preliminary results from synthetic examples show that the framework works with fewer samples, and the learned group Bayesian networks from different sets of experimental data agree with each other most of the time. The experiments with the real data also show some domain-meaningful results. This framework can also be applied to other domains with similar assumptions.