Shuai Zhanag, Sally McClean, Bryan Scotney
The vision of the Semantic Web brings challenges to knowledge discovery on databases in such heterogeneous distributed open environment. The databases are developed independently with semantic information embedded, and they are heterogeneous with respect to the data granularity, ontology/scheme information etc. The Distributed knowledge discovery (DKD) methods are required to take semantic information into the process alongside the data, also to resolve heterogeneity issues among the distributed databases. Most current DKD methods fail to do so because they assume that distributed databases come from part of a virtual global table, in other words, they share the same semantics and data structure. In this paper, we propose a model-based clustering method on semantically heterogeneous distributed databases that can cope with these two requirements. It deals with data that are generated by a mixture of underlying distributions represented by a mixture model in which each component corresponds to a different cluster. It also resolves the heterogeneity caused by the heterogeneous classification schema simultaneously with the clustering process without previously homogenizing the heterogeneous local ontologies to a shared ontology.
Subjects: 12. Machine Learning and Discovery; 3.4 Probabilistic Reasoning