Philip K. Chan and Salvatore J. Stolfo
Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of very large network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world irnponance. Learning techniques are central to knowledge discovery and the approach proposed in this paper may substantially increase the amount of data a knowledge discovery system can handle effectively. Metalearning is proposed as a general technique to integrating a number of distinct learning processes. This paper details several meta-leaming strategies for integrating independently learned classifiers by the same leamer in a parallel and distributed computing environment. Our strategies are particularly suited for massive amounts of data that main-memorybased learning algorithms cannot efficiently handle. The strategies are also independent of the particular learning algorithm used and the underlying parallel and distributed platform. Preliminary experiments using different data sets and algorithms demonstrate encouraging results: parallel learning by meta-learning can achieve comparable prediction accuracy in less space and time than purely serial learning.