Doina Caragea, Adrian Silvescu, and Vasant Honavar, Iowa State University
Due to the increase in the amount of data gathered every day in the real world problems (e.g., bioinformatics), there is a need for inductive learning algorithms that can incrementally process large amounts of data that is being accumulated over time in physically distributed, autonomous data repositories. In the incremental setting, the learner gradually refines a hypothesis (or a set of hypotheses) as new data become available. Because of the large volume of data involved, it may not be practical to store and access the entire dataset during learning. Thus, the learner does not have access to data that has been encountered at a previous time. Learning in the distributed setting can be defined in a similar fashion. An incremental or distributed learning algorithm is said to be exact if it gives the same results as those obtained by batch learning (i.e., when the entire dataset is accessible to the learning algorithm during learning). We explore exact distributed and incremental learning algorithms that are variants and extensions of the support vector machine (SVM) family of learning algorithms.