AAAI Publications, Twenty-Ninth AAAI Conference on Artificial Intelligence

Font Size: 
Detecting and Tracking Concept Class Drift and Emergence in Non-Stationary Fast Data Streams
Brandon Shane Parker, Latifur Khan

Last modified: 2015-02-21

Abstract


As the proliferation of constant data feeds increases from social media, embedded sensors, and other sources, the capability to provide predictive concept labels to these data streams will become ever more important and lucrative. However, the dynamic, non-stationary nature, and effectively infinite length of data streams pose additional challenges for stream data mining algorithms. The sparse quantity of training data also limits the use of algorithms that are heavily dependent on supervised training. To address all these issues, we propose an incremental semi-supervised method that provides general concept class label predictions, but it also tracks concept clusters within the feature space using an innovative new online clustering algorithm. Each concept cluster contains an embedded stream classifier, creating a diverse ensemble for data instance classification within the generative model used for detecting emerging concepts in the stream. Unlike other recent novel class detection methods, our method goes beyond detecting, and continues to differentiate and track the emerging concepts. We show the effectiveness of our method on several synthetic and real world data sets, and we compare the results against other leading baseline methods.

Keywords


Fast Data; Novel class detection; Non-Stationary stream classification; semi-supervised learning; stream clustering

Full Text: PDF