Usama M. Fayyad
Knowledge Discovery in Databases (KDD) is a new field of research concerned with the extraction of high-level information (knowledge) from low-level data (usually stored in large databases) . It is an area of interest to researchers and practitioners from many fields including: AI, statistics, pattern recognition, databases, visualization, and high-performance and parallel computing. The basic problem is to search databases for patterns or models that can be useful in accomplishing one or more goals. Examples of such goals include: prediction (e.g. regression and classification), descriptive or generative modeling (e.g. clustering), data summarization (e.g. report generation), or visualization of either data or extracted knowledge (e.g. to support decision making or exploratory data analysis). KDD is a process that includes many steps. Among these steps are: data preparation and cleaning, data selection and sampling, preprocessing and transformation, data mining to extract patterns and models, interpretation and evaluation of extracted information, and finally evaluation, rendering, or use of final extracted knowledge. Note that under this view, data mining constitutes one of the steps of the overall KDD process. The other steps are essential to make the application of data mining possible, and to make the results useful. Within data mining, methods for deriving patterns or extracting models originate from statistics, machine learning, statistical pattern recognition, uncertainty management, and database methods such as on-line analysis processing (OLAP) or association rules . The process is typically highly interactive and may involve many iterations before useful knowledge is extracted from the underlying data. This talk will give an overview and summary of the rapidly growing field of KDD, and then focus on two specific applications in scientific data analysis to illustrate the potential, limitations, challenges, and promise of KDD. An overview of the KDD process is given in .