Branislav Kveton, Milos Hauskrecht
Markov decision processes (MDPs) with discrete and continuous state and action components can be solved efficiently by hybrid approximate linear programming (HALP). The main idea of the approach is to approximate the optimal value function by a set of basis functions and optimize their weights by linear programming. The quality of this approximation naturally depends on its basis functions. However, basis functions leading to good approximations are rarely known in advance. In this paper, we propose a new approach that discovers these functions automatically. The method relies on a class of parametric basis function models, which are optimized using the dual formulation of a relaxed HALP. We demonstrate the performance of our method on two hybrid optimization problems and compare it to manually selected basis functions.
Subjects: 15.5 Decision Theory; 12.1 Reinforcement Learning