ALP Sardag, H. Levent Akin
A real world environment is often partially observable by the agents either because of noisy sensors or incomplete perception. Moreover, it has continuous state space in nature, and agents must decide on an action for each point in internal continuous belief space. Consequently, it is convenient to model this type of decision-making problems as Partially Observable Markov Decision Processes (POMDPs) with continuous observation and state space. Most of the POMDP methods whether approximate or exact assume that the underlying world dynamics or POMDP parameters such as transition and observation probabilities are known. However, for many real world environments it is very difficult if not impossible to obtain such information. We assume that only the internal dynamics of the agent, such as the actuator noise, interpretation of the sensor suite, are known. Using these internal dynamics, our algorithm, namely Kalman Based Temporal Difference Neural Network (KBTDNN), generates an approximate optimal policy in a continuous belief state space. The policy over continuous belief state space is represented by a temporal difference neural network. KBTDNN deals with continuous Gaussian-based POMDPs. It makes use of Kalman Filter for belief state estimation. Given only the MDP reward and the internal dynamics of the agent, KBTDNN can automatically construct the approximate optimal policy without the need for discretization of the state and observation space.
Subjects: 3.4 Probabilistic Reasoning; 12.1 Reinforcement Learning
Submitted: May 30, 2006