Kenji Kira, Larry A. Rendell
For real-world concept learning problems, feature selection is important to speed up learning and to improve concept quality. We review and analyze past approaches to feature selection and note their strengths and weaknesses. We then introduce and theoretically examine a new algorithm Relief which selects relevant features using a statistical method. Relief does not depend on heuristics, is accurate even if features interact, and is noise-tolerant. It requires only linear time in the number of given features and the number of training instances, regardless of the target concept complexity. The algorithm also has certain limitations such as non-optimal feature set size. Ways to overcome the limitations are suggested. We also report the test results of comparison between Relief and other feature selection algorithms. The empirical results support the theoretical analysis, suggesting a practical approach to feature selection for real-world problems.