Robby Goetschalckx, Jan Ramon
We discuss the problem of policy learning in a Markov Decision Process where only a restricted, limited subset of the full policy space can be used. In this way useful background knowledge can be incoorporated to reduce the search space. This is useful when we know the optimal policy will belong to a specific subset of the full policy space, or when only a limited part of the policy space is useable in practice. We suggest and discuss a number of different approaches based onexisting work in policy search methods. None of these methods can be easily adapted to handle the setting of a restricted policy space. We point out a number of difficulties which arise and assumptions which have to be made for some approaches to work.
Subjects: 12.1 Reinforcement Learning; 15.2 Constraint Satisfaction
Submitted: Apr 12, 2007