Valery Guralnik, Duminda Wijesekera, Jaideep Srivastava
Sequence data arise naturally in many applications, and can be viewed as an ordering of events, where each event has an associated time of occurrence. An important characteristic of event sequences is the occurrence of episodes, i.e. a collection of events occurring in a certain pattern. Of special interest are frequent episodes, i.e. episodes occurring with a frequency above a certain threshold. In this paper, we study the problem of mining for frequent episodes in sequence data. We present a framework for efficient mining of frequent episodes which goes beyond previous work in a number of ways. First, we present a language for specifying episodes of interest. Second, we describe a novel data structure, called the sequential pattern tree (SP Tree), which captures the relationships specified in the pattern language in a very compact manner. Third, we show how this data structure can be used by a standard bottom-up mining algorithm to generate frequent episodes in an efficient manner. Finally, we show how the SP Tree can be optimized by sharing common conditions, and evaluating each such expression only once. We present the results of an evaluation of the proposed techniques.