Mustafa Bilgic, Lise Getoor
We address the problem of efficient feature-value acquisition for classification in domains in which there are varying costs associated with both feature acquisition and misclassification. The objective is to minimize the sum of the information acquisition cost and misclassification cost. Any decision theoretic strategy tackling this problem needs to compute value of information for sets of features. Having calculated this information, different acquisition strategies are possible (acquiring one feature at time, acquiring features in sets, etc.). However, because the value of information calculation for arbitrary subsets of features is computationally intractable, most traditional approaches have been greedy, computing values of features one at a time. We make the problem of value of information calculation tractable in practice by introducing a novel data structure called the Value of Information Lattice (VOILA). VOILA exploits dependencies between missing features and makes sharing of information value computations between different feature subsets possible. To the best of our knowledge, performance differences between greedy acquisition, acquiring features in sets, and a mixed strategy have not been investigated empirically in the past, due to inherit intractability of the problem. With the help of VOILA, we are able to evaluate these strategies on five real world datasets under various cost assumptions. We show that VOILA reduces computation time dramatically. We also show that the mixed strategy outperforms both greedy acquisition and acquisition in sets.
Subjects: 12. Machine Learning and Discovery; 15.5 Decision Theory
Submitted: Apr 24, 2007