Doina Caragea, Jie Bao, and Vasant Honavar
With the advent of the Semantic Web, there is increased availability of meta data (ontologies) that make explicit the semantic commitments associated with the data sources. Together with tools for specifying mappings between ontologies, this has opened up for the first time, the possibility of acquiring knowledge from such ontology extended, semantically disparate data sources. Hence, there is an urgent need for machine learning algorithms for building predictive models (e.g., classifiers) in a setting where there is no unique global interpretation of data from semantically disparate sources and it is neither feasible nor desirable to collect data from such sources in a centralized data warehouse. We formulate the problem of learning classifiers from a set of related, semantically heterogeneous data sources, under the assumption that ontologies and mappings from a user ontology to the data source ontologies are given. We design a general strategy for learning classifiers from such data sources by reducing the problem of learning to the problem of answering queries from semantically heterogeneous data and we show how to answer such queries.