Taghi M. Khoshgoftaar, Chris Seiffert, Naeem, Seliya
A low-effort data mining approach to labeling network event records in a WLAN is proposed. The problem being addressed is often observed in an AI and data mining strategy to network intrusion detection,i.e., need for a training dataset of network event records that arelabeled as either normal or an intrusion type. Given the dynamic nature of intrusion detection, such a dataset is often very large in size, especially in a WLAN where several devices communicate with the network in a rather adhoc manner. The large size of such a training dataset adversely affects the effort required by the domain expert in labeling all the training dataset records. A clustering algorithm is initially used to form groups of similar network events, which the expert analyzes and assigns each cluster to one of four classes: definite intrusion, possibly intrusion, probably normal, and definite normal. An ensemble classifier is then used to cleanse the labeled dataset of likely mislabeling errors made by the expert. This combined strategy results in the expert examining only a very small proportion of the given intrusion detection training dataset. The proposed approach is investigated with network traffic data obtained from a real-world WLAN. An ensemble classifier-based intrusion detection model built with the labeled training dataset yields good prediction accuracy.
Subjects: 1. Applications; 12. Machine Learning and Discovery
Submitted: Feb 5, 2007