Runu Rathi, Diane J. Cook, and Lawrence B. Holder, The University of Texas at Arlington
One of the main challenges for knowledge discovery and data mining systems is to scale up their data interpretation abilities to discover interesting patterns in large datasets. This research addresses the scalability of graph-based discovery to monolithic datasets, which are prevalent in many real-world domains like bioinformatics, where vast amounts of data must be examined to find meaningful structures. We introduce a technique by which these datasets can be automatically partitioned and mined serially with minimal impact on the result quality. We present applications of our work in both artificially-generated databases and a bioinformatics domain.