Arno Siebes, CWI, Database Research Group, The Netherlands
Data mining systems have to evolve from a set of specialised routines to more generally applicable inductive query languages to satisfy industry’s need for strategic information. This paper introduces such an inductive query language called Data Surveying. Data Surveying is the discovery of interesting subsets of the database. Groups of customers whose behaviour deviates from average customer behaviour are examples of such interesting subsets. A user specifies what makes a subset interesting through a survey task. The wide applicability of this scheme is illustrated by a variety of examples. To implement an inductive query language system, the "what" (the kind of strategic information sought) has to be made independent from the "how" (how this strategic information is discovered). In other words, the discovery algorithms have to be task independent. In this paper, operators on the search space are introduced to achieve this independence. The discovery algorithms are defined relative to these operators. To enforce efficient discovery, the notion of polynomial convergence is defined for these algorithms. Domain knowledge plays an important role in the specification of both the survey task and the operators.