A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs

Roy Fox, Moshe Tennenholtz

An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly action which completely observes the state. We introduce UR-max, a reinforcement learning algorithm with polynomial interaction complexity for unknown OCOMDPs.

Subjects: 12. Machine Learning and Discovery; 12.1 Reinforcement Learning

Submitted: Apr 19, 2007

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.