Roy Fox, Moshe Tennenholtz
An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly action which completely observes the state. We introduce UR-max, a reinforcement learning algorithm with polynomial interaction complexity for unknown OCOMDPs.
Subjects: 12. Machine Learning and Discovery; 12.1 Reinforcement Learning
Submitted: Apr 19, 2007