Jose' C. Pinheiro and Don X. Sun
Many real-world KDD expeditions involve investigation of relationships between variables in different, heterogeneous databases. We present a dynamic programming technique for linking records in multiple heterogeneous databases using loosely defined fields that allow free-style verbatim entries. We develop an interestingness measure based on non-parametric randomization tests, which can be used for mining potentially useful relationships among variables. This measure uses distributional characteristics of historical events, hence accommodating variable-length records in a natural way. As an illustration, we include a successful application of the proposed methodology to a real-world data mining problem at Lucent Technologies.