AAAI Publications, Twenty-Third International FLAIRS Conference

Font Size: 
An Evaluation of Sampling on Filter-Based Feature Selection Methods
Kehan Gao, Taghi M. Khoshgoftaar, Jason Van Hulse

Last modified: 2010-05-06


Feature selection and data sampling are two of the most important data preprocessing activities in the practice of data mining. Feature selection is used to remove less important features from the training data set, while data sampling is an effective means for dealing with the class imbalance problem. While the impacts of feature selection and class imbalance have been frequently investigated in isolation, their combined impacts have not received enough attention in research. This paper presents an empirical investigation of feature selection on imbalanced data. Six feature selection techniques and three data sampling methods are studied. Our efforts are focused on two different data preprocessing scenarios: data sampling used before feature selection and data sampling used after feature selection. The experimental results demonstrate that the after case generally performs better than the before case.


filter-based feature selection; data sampling; imbalanced data

Full Text: PDF