Lawrence B. Holder, Diane J. Cook, and Surnjani Djoko
Because many databases contain or can be embellished with structural information, a method for identifying interesting and repetitive substructures is an essential component to discovering knowledge in such databases. This paper describes the SUBDUE system, which uses the minimum description length (MDL) principle to discover substructures that compress the database and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. Inclusion of background knowledgeguides SUBDUE toward appropriate substructures for a particular domain or discovery goal, and the use of an inexact graph match allows a controlled amount of deviations in the instance of a substructure concept. We describe the application of SUBDUE to a variety of domains. We also discuss approaches to combining SUBDUE with non-structural discovery systems.