Vasant Honavar, Doina Caragea
We present a sufficient statistics based framework for learning predictive models from semantically disparate, distributed data. The proposed approach yelds provably exact algorithms (relative to their centralized counterparts) for learning classifiers from distributed data and lends itself to adaptation to settings where the data reside in databases that have disparate schema and data semantics. The resulting algorithms are being implemented as part of INDUS, an open source suite of software for knowledge acquisition from large distributed, semantically disparate data sources.
Subjects: 12. Machine Learning and Discovery; 10. Knowledge Acquisition
Submitted: Feb 12, 2008