Scott B. Huffman and David Steier
Heterogeneous data sources often exhibit semantic heterogeneity at the data level; that is, the same entity in the world is referred to in different ways both within and across sources. This paper discusses a framework for combining information from such sources, called heuristic join, that is an extension of the familiar equi-join for homogeneous sources. Heuristic join uses heuristic match operators rather than simple equality to determine whether tuples refer to the same entity. The inexactness of heuristic matching introduces a number of parameters into heuristic join that are not present in equi-joins. Our work is motivated by a real-world data integration problem that required the use of heuristic joins.