Methods for Linking and Mining Massive Heterogeneous Databases

Jose' C. Pinheiro and Don X. Sun

Many real-world KDD expeditions involve investigation of relationships between variables in different, heterogeneous databases. We present a dynamic programming technique for linking records in multiple heterogeneous databases using loosely defined fields that allow free-style verbatim entries. We develop an interestingness measure based on non-parametric randomization tests, which can be used for mining potentially useful relationships among variables. This measure uses distributional characteristics of historical events, hence accommodating variable-length records in a natural way. As an illustration, we include a successful application of the proposed methodology to a real-world data mining problem at Lucent Technologies.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.