Craig Boutilier, University of Toronto
Preference elicitation is a key problem facing the deployment of intelligent systems that make or recommend decisions on the behalf of users. Since not all aspects of a utility function have the same impact on object-level decision quality, determining which information to extract from a user is itself a sequential decision problem, balancing the amount of elicitation effort and time with decision quality. We formulate this problem as a partially-observable Markov decision process (POMDP). Because of the continuous nature of the state and action spaces of this POMDP, standard techniques cannot be used to solve it. We describe methods that exploit the special structure of preference elicitation to deal with parameterized belief states over the continuous state space, and gradient techniques for optimizing parameterized actions. These methods can be used with a number of different belief state representations, including mixture models.