David H. Foster, W. James Bishop, Scott A. King, Jack Park
An important area of KDD research involves development of techniques which transform raw data into forms more useful for prediction or explanation. We present an approach to automating the search for "indicator functions" which mediate such transformations. The fitness of a function is measured as its contribution to discerning different classes of data. Genetic programming techniques are applied to the search for and improvement of the programs which make up these functions. Rough set theory is used to evaluate the fitness of functions. Rough set theory provides a unique evaluator in that it allows the fitness of each function to depend on the combined performance of a population of functions. This is desirable in applications which need a population of programs that perform well in concert and contrasts with traditional genetic programming applications which have as there goal to find a single program which performs well. This approach has been applied to a small database of iris flowers with the goal of learning to predict the species of the flower given the values of four iris attributes and to a larger breast cancer database with the goal of predicting whether remission will occur within a five year period.