Hussein Almuallim, Thomas G. Dietterich
In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hy- potheses definable over as few features as possible. This paper defines and studies this bias. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requires q(1/e ln 1/d + 1/e [2P + p ln n]) training examples to guarantee PAC-learning a concept having p relevant features out of n avail- able features. This bound is only logarithmic in the number of irrelevant features. The paper also presents a quasi-polynomial time algorithm, FOCUS, which implements MIN-FEATURES. Experimental studies are presented that compare FOCUS to the ID3 and FRINGE algorithms. These experiments show that- contrary to expectations-these algorithms do not implement good approximations of MIN-FEATURES. The coverage, sample complexity, and generalization performance of FOCUS is substantially better than either ID3 or FRINGE on learning problems where the MIN-FEATURES bias is appropriate. This suggests that, in practical applications, training data should be preprocessed to remove irrelevant features before being given to ID3 or FRINGE.