Gemma C. Garriga, Roni Khardon, Luc De Raedt
We investigate the problem of mining closed sets in multi-relational databases. Previous work introduced different semantics and associated algorithms for mining closed sets in multi-relational databases. However, insight into the implications of semantic choices and the relationships among them was still lacking. Our investigation shows that the semantic choices are important because they imply different properties, which in turn affect the range of algorithms that can mine for such sets. Of particular interest is the question whether the seminal LCM algorithm by Uno et al. can be upgraded towards multi-relational problems. LCM is attractive since its run time is linear in the number of closed sets and it does not need to store outputs in order to avoid duplicates. We provide a positive answer to this question for some of the semantic choices, and report on experiments that evaluate the scalability and applicability of the upgraded algorithm on benchmark problems.
Subjects: 12. Machine Learning and Discoveryn
Submitted: Oct 15, 2006