AAAI Publications, The Twenty-Sixth International FLAIRS Conference

Font Size: 
Learning Policies in Partially Observable MDPs with Abstract Actions Using Value Iteration
Hamed Janzadeh, Manfred Huber

Last modified: 2013-05-19


While the use of abstraction and its benefit in terms of transferring learned information to new tasks has been studied extensively and successfully in MDPs, it has not been studied in the context of Partially Observable MDPs. This paper addresses the problem of transferring skills from previous experiences in POMDP models using high-level actions (options). It shows that the optimal value function remains piecewise-linear and convex when policies are high-level actions, and shows how value iteration algorithms can be modified to support options. The results can be applied to all existing value Iteration algorithms. Experiments show how adding options can speed up the learning process.


POMDP; value iteration; reinforcement learning; abstraction; options.

Full Text: PDF