Hoi-Yee Hwang and Wai-Chee Fu, Chinese University of Hong Kong
Data mining or knowledge discovery in databases is the search for relationships and global patterns that exist but is hidden in large databases. Many different methods have been proposed and one of them is an attribute-oriented induction method. In this method, domain knowledge in the form of concept hierarchies helps to generalize the concepts of the attributes in the database relations. This approach has been generalized to the rule-based attribute-oriented induction. The time complexity of the original methods is given by O(N log N), where N is the number of relevant tuples in the database. In this paper, we make use of the static property of the database schema and the concept hierarchies to derive more efficient algorithms. Given that the concept hierarchies and the resulting knowledge are small in size compared to the database, the complexity of our algorithm is O(N). The amount of disk I/O is decreased by O(log N) times compared to the previous methods. We believe that this improvement in performance will give extra power to the attribute-oriented method.