David Mimno, Andrew McCallum, Gerome Miklau
Databases constructed automatically through web mining and information extraction often overlap with databases constructed and curated by hand. These two types of databases are complementary: automatic extraction provides increased scope, while curated databases provide increased accuracy. The uncertain nature of such integration tasks suggests that the final representation of the merged database should represent multiple possible values. We present initial work on a system to integrate two bibliographic databases, DBLP and Rexa, while maintaining and assigning probabilistic confidences to different alternative values in merged records.
Subjects: 11. Knowledge Representation; 3.4 Probabilistic Reasoning
Submitted: May 15, 2007