Daniel Nikovski, Carnegie Mellon University
We are addressing the problem of learning probabilistic models of the interaction between a mobile robot and its environment and using these models for task planning. This requires modifying the state-of-the-art reinforcement learning algorithms to deal with hidden state and high-dimensional observation spaces of continuous variables. Our approach is to identify hidden states by means of the trajectories leading into and out of them, and perform clustering in this embedding trajectory space in order to compile a partially observable Markov decision process (POMDP) model, which can be used for approximate decision-theoretic planning. The ultimate objective of our work is to develop algorithms that learn POMDP models with discrete hidden states defined (grounded) directly into continuous sensory variables such as sonar and infrared readings.