Rakesh Agrawal, Manish Mehta, John Shafer, Ramakrishnan Srikant, Andreas Arning, Toni Bollinger
The goal of the Quest project at the IBM Almaden Research center is to develop technology to enable a new breed of data intensive decision-support applications. This paper is a capsule summary of the current functionality and architecture of the Quest data mining system. Our overall approach has been to identify basic data mining operations that cut across applications and develop fast, scalable algorithms for their execution (Agrawal, Imielinski, and Swami 1993a). We wanted our algorithms to: 1) discover patterns in very large databases, rather than simply verify that a pattern exists; 2) have a completeness property that guarantees that all patterns of certain types have been discovered; 3) have high performance and near-linear scaling on very large (multiple gigabytes) real-life databases. We discuss the operations of discovering association rules, sequential patterns, time-series clustering, classification, and incremental mining. Due to space limitation, we only give highlights and point the reader to the relevant information for details. Unfortunately, for the same reason, we have not been able to include a discussion of the related work. Besides proceedings of the KDD, SIGMOD, VLDB, and Data Engineering Conferences, other excellent sources of information about the data mining systems and algorithms include (Piatetsky-Shapiro and Frawley 1991) (Fayyad et al. 1995). IBM is making the Quest technology commercially available through the data mining product, IBM Intelligent Miner.