L. Hunter and T. Klein
Many methods for analyzing biological problems are constrained by problem size. The ability to distinguish between relevant and irrelevant features of a problem may allow a problem to be reduced in size sufficiendy to make it tractable. The issue of learning in the presence of large numbers of irrelevant features is an important one in machine learning, and recently, several methods have been proposed to address this issue. A combination of machine learning approaches and statistical analysis methods can be used to identify a set of relevant attributes for currently intractable biological problems. We call our framework F/I/E (Focus-Induce-Extract). As an example of this methodology, this paper reports on the identification of the features of mutations in collagen that are likely to be relevant in the bone disease Osteogenesis imperfecta.