John Rachlin, Simon Kasif, Steven Salzberg and David W. Aha
As scientific databases continue to grow, the analysis of scientific data becomes an increasingly important application area for machine learning research. Much of the research on scientific data is directed towards predicting properties of a physical process. Such processes are often described in terms of a function defined over the attributes of the domain or a stochastic model such as a Markov chain. The first question one should ask when studying such problems is: what is the best machine learning technique to use for this problem domain? A general solution to this problem remains elusive. It has been argued (e.g., by the case-based reasoning community) that instead of producing a description of the problem domain in terms of logical rules, functional descriptions, or a complex statistical model, it is possible to store a collection of memories (cases) and perform prediction by interpolating from them. In this practical context, the second question to ask is: are memory-based reasoning (MBI~) methods effective for a given problem domain as more complex approaches (e.g., probabilistic networks). This paper addresses this question both experimentally and formally.