Simon Perkins and James Theiler
In the standard feature selection problem, we are given a fixed set of candidate features for use in a learning problem, and must select a subset that will be used to train a model that is "as good as possible" according to some criterion. In this paper, we present an interesting and useful variant, the online feature selection problem, in which, instead of all features being available from the start, features arrive one at a time. The learner’s task is to select a subset of features and return a corresponding model at each time step which is as good as possible given the features seen so far. We argue that existing feature selection methods do not perform well in this scenario, and describe a promising alternative method, based on a stagewise gradient descent technique which we call grafting.