Practical POMDPs for Personal Assistant Domains

Pradeep Varakantham, Rajiv T. Maheswaran, and Milind Tambe

Agents or agent teams deployed to assist humans often face the challenge of monitoring state of key processes in their environment, including the state of their human users, and making periodic decisions based on such monitoring. The challenge is particularly difficult given the significant observational uncertainty, and uncertainty in the outcome of agent’s actions. POMDPs (partially observable markov decision problems) appear well-suited to enable agents to address such uncertainties and costs; yet slow run-times in generating optimal POMDP policies presents a significant hurdle. This slowness can be attributed to cautious planning for all possible belief states, e.g., the uncertainty in the monitored process is assumed to range over all possible states at all times. This paper introduces three key techniques to speedup POMDP policy generation that exploit the notion of progress or dynamics in personal assistant domains. The key insight is that given an initial (possibly uncertain) starting set of states, the agent needs to be prepared to act only in a limited range of belief states; most other belief states are simply unreachable given the dynamics of the monitored process, and no policy needs to be generated for such belief states. The techniques we propose are complementary to most existing exact and approximate POMDP policy generation algorithms. Indeed, we illustrate our technique by enhancing generalized incremental pruning (GIP), one of the most efficient exact algorithms for POMDP policy generation and illustrate orders of magnitude speedup in policy generation. Such speedup would facilitate agents’ deploying POMDPs in assisting human users.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.